Allow UUIDs to be passed in as standard Hex-encoded strings.

iconara / cql-rb

Cassandra CQL 3 binary protocol driver for Ruby

106 stars 31 forks source link

Allow UUIDs to be passed in as standard Hex-encoded strings. #96

Closed jasonmk closed 10 years ago

jasonmk commented 10 years ago

This is a fairly simple change. It simply looks to see if a uuid was passed in as a string and converts it to an integer value if so (for situations when it doesn't make sense to stand up a TimeUuid object just to pull the value out).

iconara commented 10 years ago

Thank you for the pull request.

To me this just repeats the work done in Uuid#initialize/#from_s (that's the v2.0 implementation, the v1.2 is a bit convoluted), with fewer safety checks and worse performance (I haven't benchmarked it, but it generates a lot of unnecessary garbage by allocating a string and regex on each call).

jasonmk commented 10 years ago

Performance is actually faster because we don't have to deal with allocating the UUID object. The logic is the same. The 2.0 implementation still does a gsub followed by a cast. The error checking is definitely missing from what I sent you. My theory was that you can't get an invalid integer, so it won't choke anything. The database should just throw back a QueryError if it's not a valid UUID (I assume).

2.0.0-p247 :166 > puts Benchmark.measure {1000000.times { uuid.gsub(/-/,'').to_i(16) } } 5.320000 0.000000 5.320000 ( 5.322824) => nil 2.0.0-p247 :167 > puts Benchmark.measure {1000000.times { Uuid.new(uuid) } } 7.760000 0.000000 7.760000 ( 7.773411) => nil 2.0.0-p247 :168 > puts Benchmark.measure {1000000.times { Cql::Uuid.new(uuid) } } 23.340000 0.000000 23.340000 ( 23.379227) => nil 2.0.0-p247 :169 >

In the above, the plain Uuid is your 2.0 implementation and Cql::Uuid is the 1.2 version. Given that I'm still using 1.2, I'd rather not use that method of calculating it.

That said, with 2.0, I'd probably change the pull request to call Uuid.new instead of calculating it myself for consistency. The difference isn't that large. For 1.2 though, I'd say it's still valid. I can change the ==. is_a? isn't as fast as a straight class check and I figured subclasses of String weren't something I had to worry too much about, but I can definitely change it.

iconara commented 10 years ago

Your benchmark is flawed. Of course doing only the gsub will be faster since the Uuid.new will also do necessary validation, plus the same gsub. Your version will also allocate many more objects (the regex and string), which will not impact a benchmark that only runs for a short time, but will make a real application spend more time doing garbage collection.

n.to_i(16) is also not the same as Integer(n, 16). Passing in something that contains characters other than a-f, 0-9 will give you 0 with to_i.

I'm not sure about doing Uuid.new on strings in v2.0 either, could you explain a bit more about the motivation behind this? Why is it not good enough to do Uuid.new in the application code?

jasonmk commented 10 years ago

Ok, I see what you were saying now. I missed the fact that the constants were declared outside the block. In my head, I was automatically interpreting the net result without looking at the code.

As to the motivation, the reason was that I'm plugging cql-rb into a rails app and the rails standard is to allow keys to be specified as strings even if they're integers since that's how they come in via params. I couldn't find a good central place in my app to do the conversion prior to hitting cql-rb. Especially since some of my column families do actually use strings as keys.

That said, if you think this is a bad idea, I'll defer to your judgment since you've obviously spent a lot more time working in driver code than I have. I'll see if I can find a way to cleanly do the conversion in my app code.

Cheers, and thanks for the excellent library. I'm currently swapping out cassandra-cql and thrift for this so that I can get away from thrift. It's not trivial, but I like this library a lot better.