thobbs / phpcassa

PHP client library for Apache Cassandra
thobbs.github.com/phpcassa
MIT License
248 stars 78 forks source link

IndexesColumnFamilyIterator::get_buffer() returns duplicated rows #145

Closed vladbalmos closed 10 years ago

vladbalmos commented 10 years ago

I'm running the following query using v1.0.a.5 on cassandra 1.0.3

$handle = new ColumnFamily($conn, 'transactions'); $handle->return_format = ColumnFamiliy::ARRAY_FORMAT; $indexExpr = new IndexExpression('userID', 39); $indexClause = new IndexClause(array($indexExpr), '', 5000); $rows = $handle->get_indexed_slices($IndexClause);

$rows should return only ~250 results for that specific userID, instead it returns 5000 records (the count value for the index clause). It basically duplicates the valid 250 rows until it fills the 5000 limit. I came to that conclusion by digging into IndexedColumnFamilyIterator and writing the $current_buffer to file during a request and then checking the primary keys for all those records on a first pass, then checking the serialized footprint for each row on a second pass. As I said, only ~250 records are unique.

Is there something wrong with my query? Is it phpcassa or a cassandra bug?

Thank you very much!

vladbalmos commented 10 years ago

running the same query in cassandra-cli returns the correct number of records:

get transactions where userID = 39

250 Rows Returned Elapsed time: 2515 msec(s).

thobbs commented 10 years ago

I can't reproduce this on the latest master, and I think what you're seeing is this: https://github.com/thobbs/phpcassa/commit/fbdc231851a027a9511c87ccefe1aa9dd792f1fa.

Upgrading to 1.0.a.6 or 1.1.0 should resolve the problem.