Closed jwr closed 11 years ago
Hi Jan,
This sounds to me like something is being unexpectedly serialized. Here's an example that'll hopefully help convey the core of the problem:
(wcar* (car/set "mykey" true)
(car/get "mykey")) ; => ["OK" true]
(wcar* (car/lua "return redis.call('get', 'mykey')" {:_ ""} {})) ; => true
(wcar* (car/lua "local get_val = redis.call('get', 'mykey')
if get_val == true then return 'true' else return 'false' end"
{:_ ""} {})) ; => "false"
What's actually happening:
true
value is being transparently serialized with
Nippy into a bytestring that looks something like "\x00>\x03\b\n\x00["
if you were to examine it with redis-cli
, etc.true
. We might not
even realise there's any de/serialization going on.When sending data to Redis, Carmine must choose how to represent it. Since Redis has only a single datatype (binary-safe string), some tradeoffs are necessary.
In the case of Carmine:
incr
].In your case I would check if something is possibly writing a serializable type somewhere that you weren't expecting to be serialized (e.g. (/ 2 3) which is a ratio type so subject to serialization).
Feel free to shout if I can provide any other assistance.
Cheers,
Peter,
Thanks for the prompt reply!
In case #2 you might be right. I tried to deserialize this value: redis 127.0.0.1:6379> set test-key "\x00>\x01\x00\x03"
redis 127.0.0.1:6379> strlen test-key (integer) 5
and (carmine/get "test-key") returns nil. Would this be a valid encoding for the actual Clojure nil value?
But note that in case #1 I manually looked up (using redis-cli) the value (ruid) that the lua script accessed: it's fine, it's a redis integer. It isn't a serialized byte string. And that same value "165854" passed through Carmine to Clojure code, where on a subsequent store to redis it became "\x00>\x03\b\n\x00[".
I tried the same trick as above to deserialize this value and it deserializes to a single character [.
Is it possible that an unhandled error in the lua script causes this? E.g. instead of return {ruid, created} something else gets returned, which Carmine interprets as the [ character?
Peter,
Thanks for the prompt reply!
No problem, you caught me at a good time.
In case #2 https://github.com/ptaoussanis/carmine/pull/2 you might be right. I tried to deserialize this value: redis 127.0.0.1:6379> set test-key "\x00>\x01\x00\x03"
redis 127.0.0.1:6379> strlen test-key (integer) 5
and (carmine/get "test-key") returns nil. Would this be a valid encoding for the actual Clojure nil value?
Case #2:
That looks too small. Nippy's serialized nil
value is (seq (nippy/freeze nil))
=> (78 80 89 1 1 0 3)
and Carmine would prefix that with another 2
bytes (0 62)
to indicate that the data is serialized.
If you can locate the key, try get
it using Carmine and see what it's
being interpreted as.
Alternatively, (wcar* (car/parse-raw (car/get "test-key")))
will get the
value back without any processing on Carmine's side. This'll be a raw
bytearray that you can inspect by calling seq
on it.
This is stats code that only stores values (strings, in this case)
Is there any possibility that an occasional non-string value gets through? I might put an assert/log on the input type.
If a non-string is being written (e.g. bytes, ratio, collection, nil,
boolean, etc.) - then it's get serialized and you'd see a
"\x00>\x01\x00\x03"-type
value with redis-cli
.
Case #1:
But note that in case #1 https://github.com/ptaoussanis/carmine/pull/1 I
manually looked up (using redis-cli) the value (ruid) that the lua script accessed: it's fine, it's a redis integer. It isn't a serialized byte string. And that same value "165854" passed through Carmine to Clojure code, where on a subsequent store to redis it became "\x00>\x03\b\n\x00[".
So to clarify:
- You had a key that was definitely (verified via redis-cli) an integer string in Redis.
- Carmine took the value, transformed it in some way (?), and rewrote it to Redis.
- After the rewrite, the value is a serialized form.
Is that right?
That'd lead me to suspect that some part of the transformation in (2) unexpectedly produced a serializable form. Remember that there's nothing inherently wrong (corrupt) about the "\x00"-style strings, that's just Redis' representation of arbitrary bytes. It'd be the expected form if something got unexpectedly serialized.
Is it possible that an unhandled error in the lua script causes this? E.g.
instead of return {ruid, created} something else gets returned, which Carmine interprets as the [ character?
That depends on your definition of "error".
If there's a Lua syntax error anywhere in the script, the script won't start running. So scripts are atomic (excepting a Redis server crash in the middle of processing a script, I believe).
If there's a logic error in a script though (for example, reading a serialized value and transforming it with the assumption that it's an unserialized string) - that can mangle data.
For hunting this down, I'd try narrow in on possible entry points for unexpected serializable data (e.g. a nil when you thought you'd only see integers) and I'd add some logging/assert calls just to verify.
Cheers,
That looks too small. Nippy's serialized
nil
value is(seq (nippy/freeze nil))
=>(78 80 89 1 1 0 3)
and Carmine would prefix that with another 2 bytes(0 62)
to indicate that the data is serialized.If you can locate the key, try
get
it using Carmine and see what it's being interpreted as.
I think that is exactly what I did? Unless I'm misunderstanding you. I took the string that appeared in my production db under "test-key", verified that it really only is 5 bytes long, and tried to access it using Carmine's get. The result was nil. I also tried changing single bytes within the key, which produced various kinds of interesting Carmine exceptions, so something really is getting deserialized.
The bytes are: 0 62 1 0 3.
Is there any possibility that an occasional non-string value gets through? I might put an assert/log on the input type.
Yes, nil might appear here (that would be my bug), but nothing else should. This value should either be a string or a nil. Still, I could add some logging and try to intercept that value.
But this case seems to be solved and not a problem with Carmine. I probably pass something I should not.
The other case is still unsolved, though:
Case #1: [...]
- You had a key that was definitely (verified via redis-cli) an integer string in Redis.
- Carmine took the value, transformed it in some way (?), and rewrote it to Redis.
- After the rewrite, the value is a serialized form.
No, not quite.
So the problem here isn't that carmine serialized the value on subsequent writes, but that the lua script returned a single character instead of a string, and something which looks like an integer instead of a boolean.
E.g. clojure code that calls: (let [[ruid created?](... %28carmine/lua-script .....%29))
got [ and 4 for ruid and created? -- this is from a script that has "return {ruid, created}" as the last line. Note that ruid is a character, not a string. That's what alerted me: my postcondition (seq ruid) failed.
The number that I get in created? changes, but possible values that I saw: 1, 4, 7 and something else that doesn't print as anything in the logs.
If there's a Lua syntax error anywhere in the script, the script won't start running. So scripts are atomic (excepting a Redis server crash in the middle of processing a script, I believe).
If there's a logic error in a script though (for example, reading a serialized value and transforming it with the assumption that it's an unserialized string) - that can mangle data.
There is no syntax error, and the code "works" fine most of the time. I expected something more along the lines of an error/exception in Lua that causes it to return something that Carmine interprets as the [ character and a number instead of two values.
The bytes are: 0 62 1 0 3.
Okay, that is a serialized nil
- but it's been serialized with an old
(pre v2) version of Nippy, which is why it's so short. v2+ nil
s include a
header so are longer. Isn't a problem, just the reason I didn't recognize
it.
So case #2 is resolved: a nil
was written at some point.
Am curious: what version of Carmine are you running?
No, not quite.
- I had a key that is definitely an integer string in Redis.
- A lua script accessed that key without doing anything else with it other than verifying that it exists ("if not ruid" branch that isn't taken) and returned it using "return {ruid, created}" along with a boolean (false) created value.
- What I got back from Carmine is a single character for ruid ('[' to be exact).
Okay, the clarification helps.
So the problem here isn't that carmine serialized the value on subsequent writes, but that the lua script returned a single character instead of a string, and something which looks like an integer instead of a boolean.
Okay, I'm with you. A couple possibilities: some error of logic in the script, arguments given to the script, destructuring of the reply, or parsing of the reply.
I'd start by looking at the reply destructuring. Note that returning
"{ruid, created}" from a Lua script will yield a 2-element vector in
Clojure, so you should be destructuring the reply as [ruid created]
(assuming there's nothing else in the pipeline).
If you're running a very old version of Carmine, it's also possible you're seeing a bug - I could give you a better guess once I know what version you're on.
I'd start by looking at the reply destructuring. Note that returning "{ruid, created}" from a Lua script will yield a 2-element vector in Clojure, so you should be destructuring the reply as
[ruid created]
(assuming there's nothing else in the pipeline).
Got it! Thanks to this I looked very closely at how I'm destructuring. And it turns out there was a leftover call to str-or-nil (basically does what the name says) in the processing, while it should have been (map str-or-nil ...) since my Lua scripts started returning multiple values instead of one.
So it was me who was breaking things: str-or-nil converted a proper Clojure vector from Carmine into a single string, while subsequent destructuring returned the first two characters of that string as, correspondingly, ruid and created?. And then I passed that single character back to Carmine, which nicely serialized it, which caused other problems down the line…
If you're running a very old version of Carmine, it's also possible you're seeing a bug - I could give you a better guess once I know what version you're on.
I am still on 1.12.0 and now I can safely think about upgrading because I understand what the issue was.
The only possible suggestion for Carmine would be an option to block serialization altogether: I do not use it in this application and its presence delays the appearance of errors. I'm thinking about assertions in my code as well: my code should never pass anything else than a string or integer to redis functions. I just don't have a good place to put such assertions in.
Many, many thanks for your help with this, sorry for taking your time with something that clearly isn't a problem with Carmine, and I definitely owe you one.
--Jan
Got it! Thanks to this I looked very closely at how I'm destructuring.
Great!
And then I passed that single character back to Carmine, which nicely serialized it, which caused other problems down the line…
It occurs to me now that it may make sense to add a knob for turning off the auto-serialization for folks that aren't using it. Will think about the best way of doing that...
If you're running a very old version of Carmine, it's also possible you're seeing a bug - I could give you a better guess once I know what version you're on.
I am still on 1.12.0 and now I can safely think about upgrading because I understand what the issue was.
Wow, okay - happy to hear it's holding up :-) That'd explain the Nippy format - you're still on Nippy 1.x.
The only possible suggestion for Carmine would be an option to block
serialization altogether: I do not use it in this application and its presence delays the appearance of errors. I'm thinking about assertions in my code as well: my code should never pass anything else than a string or integer to redis functions. I just don't have a good place to put such assertions in.
Great minds ;-) Yeah, I think that'd be sensible. Want to think about possible approaches a little.
Many, many thanks for your help with this, sorry for taking your time with
something that clearly isn't a problem with Carmine, and I definitely owe you one.
Absolutely no problem, happy I could help.
Cheers! :-)
This isn't really a bug report yet, I'm rather looking for hints and pointers. I have a system where some values transported to/from Redis seem to get corrupted. Sometimes.
Case #1: I have a lua script that does this:
1384505478.269537 [0 5.9.23.176:50292] "EVALSHA" "c9ff551721953469eb600c6a9057b5d6ace939ed" "2" "customer" "186kvcq0a-LOrFzrozoRBmVizb" 1384505478.269571 [0 lua] "get" "t2r:customer:186kvcq0a-LOrFzrozoRBmVizb" [...]
the script in question essentially does this:
local t2rkey = 't2r:' .. :customer .. ':' .. _:tid local ruid = redis.call('get', t2r_key) local created = false [...] return {ruid, created}
the elided code doesn't matter, it's an "if" that is not taken in this case. The lua script basically just performs a single GET here and returns the value along with a boolean (false) flag.
Problem is, sometimes the values returned sometimes make no sense. When the problem occurs, Clojure logging code prints the two received values as something similar to "[ 2". Investigating this closer shows that the "ruid" value is "\x00>\x03\b\n\x00[", while it should be:
redis 127.0.0.1:6379> get t2r:customer:186kvcq0a-LOrFzrozoRBmVizb
"165854"
To give you an idea of "sometimes", in this case it happens in about 1.5% of calls.
Case #2: There is also another problem, which looks related. It is in a completely different section of the code, does not involve lua scripting at all, and does not involve reading anything from redis. This is stats code that only stores values (strings, in this case), which arrive to redis in this form:
1384505479.011301 [0 5.9.23.176:50420] "ZINCRBY" "stats:customer:s:2013-11-15" "1" "\x00>\x01\x00\x03"
The "\x00>\x01\x00\x03" should normally be a human-readable string. My Clojure logs show no such corruption there, it's only visible in redis monitor.
Things of note:
Software involved: Redis 2.6.16, Clojure 1.5.1, Carmine 1.12.0.
Any ideas?