Closed nyh closed 2 months ago
@sylwiaszunejko - did we reproduce this? Do we have an existing test in other drivers (Python and friends) that we can 'translate' it?
@mykaul I haven't started working on this issue yet, but I see that there is a test for empty pages in python-driver added https://github.com/datastax/python-driver/commit/1d9077d3f4c937929acc14f45c7693e76dde39a9
Any update on the above issue?
Update:
Quick fix for user: update iteration logic to stop iterating only when len(iter.PageState()) == 0
On our side we can probably find a way to automatically retry on iterators with no results, or just mention the fact of posibility of having empty results.
I have quickly investigated posibility of automatically retry on empty iterators: In order to do that we will need to somehow pass iter.PageState() to a Query and reexecute it. Since it is manual pagination case, it should be done only by user and doing implicitly will confuse code and probably user, so I strongly believe only way to properly handly is to mention posibility of having iterators with no results in the documentation.
Is the iterator in the Go API for pages or for individual rows? If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.
Is the iterator in the Go API for pages or for individual rows? If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.
I have given it my best shot here, please take a look.
It still can produces empty iterator, but only for last page.
Otherwise it makes sure that Query.Iter()
does not give away empty result.
@nyh Did you see @dkropachev PR? Are you satisfied with the outcome?
@dkropachev please check if we want to add some documentation about previous behavior vs new. Then, we can close this one.
A user in stackoverflow, https://stackoverflow.com/questions/78544921/gocql-allow-filtering-returns-empty-records-with-page-size reported a bug when using Scylla and gocql, and scanning an entire partition with filtering (ALLOW FILTERING): If an entire page of results did not match the filter, then Scylla returns an empty page and gocql incorrectly stops the iteration immediately instead of continuing for the next page. His report has example code he used. I have not verified this report myself (I never used gocql myself).
If this report is true, I suspect it is a gocql bug, not a Scylla bug. We used to have this bug in the Python driver but it was fixed years ago (see https://github.com/scylladb/scylladb/issues/8203, and the fix https://github.com/datastax/python-driver/commit/1d9077d3f4c937929acc14f45c7693e76dde39a9), and in the Java driver (https://github.com/apache/cassandra-java-driver/pull/1544) and now I suspect we have the same bug in gocql.
We have a test in Scylla's test suite, test/cql-pytest/
test_filtering.py
::test_filtering_contiguous_nonmatching_partition_range
andtest_filtering_contiguous_nonmatching_single_partition
, using the fixed Python driver, which checks exactly this situation: a filtered scan of a table or partition with a large number of rows, only the very last of them actually matching the filter, so the first pages of results are empty. This is why I'm guessing the bug is in gocql, not in Scylla itself. By the way, if you want you can translate one of these tests into Go and see if it fails with gocql and later use it as a gocql unit test.By the way, after we solved this bug in the Python and Java drivers, we started to rely more and more on sending back empty pages. For example, we now send back an empty page also if we find ourselves scanning a very long list of tombstones, to avoid timeouts. Our drivers need to handle this case correctly.