pingles / clj-hector

Simple Cassandra client for Clojure
42 stars 19 forks source link

Use sorted maps for column slicing. #17

Closed nickmbailey closed 12 years ago

nickmbailey commented 12 years ago

Column slices were being stuffed into hash maps (unsorted) which makes paging through columns impossible.

This is just an initial pull request and we may want to actually solve this using a different method. I used a sorted map to keep the same map interface that already existed. Technically though I believe a vector would be more performant here. Inserting to the sorted-map is O(N) I believe, where appending tuples to the end of a vector should be O(1).

Generally with cassandra you wouldn't want to be pulling a huge number of columns into memory at once anyway so the performance difference is likely not a big deal. Also if you are concerned about raw client speed I'm not sure the choice to go with a hector wrapper was a good idea. Thoughts?

Another note: Whatever we decide should be applied to the row slices as well. If you are doing things correctly and using RandomPartitioner the actual ordering of the rows is somewhat meaningless, but the hash map currently prevents both paging through all rows with RP and meaningful range queries with OPP.

pingles commented 12 years ago

Hi Nick,

Thanks again for your work- I went with maps originally because I wanted something nice to use in Clojure, rather than something that was the most efficient. My usage was for storing relatively small amounts of data so I've not hit a problem with using maps so far.

As you say, probably a good idea to switch to sorted-maps for everything- I'll see if I can get around to doing that today if you don't beat me to it :)

pingles commented 12 years ago

I've changed all the other associative returns within the ToClojure protocol to use sorted maps. I've pushed up a 0.1.2 release on Clojars with your changes too- could you kick the tires and see how it goes.

Thanks, Paul