tjake / Solandra

Solandra = Solr + Cassandra
Apache License 2.0
882 stars 150 forks source link

deleting elements breaks range queries #166

Closed kRyszard closed 12 years ago

kRyszard commented 12 years ago

Hi,

we have solandra cfbce36811c14f61bfec3a99167c0d83e766012e (tjake's, 23-01-2012) we fill an index with some data, the schema has a long field we delete 1000 (out of 10000) elements by id. we perform some range queries that do not work (worked before deleting) f.e. q=post_creation_date:[1331436339000 TO 1331868339000] returns 3191 objects q=post_creation_date:[1331004339000 TO 1331868339000] returns 0 objects q=post_creation_date:[* TO 1331868339000] returns 0 objects

btw: I am aware that there is another bug in Solandra related to range queries: when solandra returns data ordered by some long field the longs are treated as numbers but when using range quieries they're compared like strings f.e. [10 TO 30] may return 10, 30, 200 (in that order).

kRyszard commented 12 years ago

i've tested the process (add docs, perform range queries, delete some data, range queries again) against few revisions of solandra and discovered, that there were working revisions. The last working revision was 513eda7c82 from 9-09-2011 and first non working rev was a32ec231de from 27-09-2011. The difference between them is as big as only one line. I've installed cassandra 1.0.8 + tjake's cfbce36811 and fixed the line - I've tested deletes+range queries - now it works ok, but I do not understand why and if it's not gonna break something else :( Anyone knows what this line means?

tjake commented 12 years ago

That one liner was to avoid pulling too much data at once. Seems like if you delete then perhaps the logic pulls only tombstoned columns and gives up.

kRyszard commented 12 years ago
  1. Does it mean if I have broken range queries (on a cluster without this fix) i can perform cleanup to remove all tombstones and make range queries working?
  2. Are the old values (4/64) "safe"? I mean is this sufficient size of data to pull to make all queries work?

btw: wow, tjake, you're alive ;P since there's an opportunity to talk to you can you please give us a quick comment on how do you see the future of solandra, I mean are you still working on it, planning a release or sth?

tjake commented 12 years ago

I think 2/3 will work.

I've been M.I.A. due to my time being spent on DataStax Enterprise Search which provides native Solr access to Cassandra column families. Also Cassandra 1.0 broke Solandra's partitioner. 1.1 will fix it so I will upgrade it then.