IndexesColumnFamilyIterator::get_buffer() returns duplicated rows

vladbalmos commented 10 years ago

I'm running the following query using v1.0.a.5 on cassandra 1.0.3

$handle = new ColumnFamily($conn, 'transactions'); $handle->return_format = ColumnFamiliy::ARRAY_FORMAT; $indexExpr = new IndexExpression('userID', 39); $indexClause = new IndexClause(array($indexExpr), '', 5000); $rows = $handle->get_indexed_slices($IndexClause);

$rows should return only ~250 results for that specific userID, instead it returns 5000 records (the count value for the index clause). It basically duplicates the valid 250 rows until it fills the 5000 limit. I came to that conclusion by digging into IndexedColumnFamilyIterator and writing the $current_buffer to file during a request and then checking the primary keys for all those records on a first pass, then checking the serialized footprint for each row on a second pass. As I said, only ~250 records are unique.

Is there something wrong with my query? Is it phpcassa or a cassandra bug?

Thank you very much!

vladbalmos commented 10 years ago

running the same query in cassandra-cli returns the correct number of records:

get transactions where userID = 39

250 Rows Returned Elapsed time: 2515 msec(s).

thobbs commented 10 years ago

I can't reproduce this on the latest master, and I think what you're seeing is this: https://github.com/thobbs/phpcassa/commit/fbdc231851a027a9511c87ccefe1aa9dd792f1fa.

Upgrading to 1.0.a.6 or 1.1.0 should resolve the problem.

thobbs / phpcassa

IndexesColumnFamilyIterator::get_buffer() returns duplicated rows #145