scylladb / gocql

Package gocql implements a fast and robust ScyllaDB client for the Go programming language.
https://docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html
BSD 3-Clause "New" or "Revised" License
181 stars 57 forks source link

Scan with filtering stopping on an empty page #180

Closed nyh closed 2 months ago

nyh commented 4 months ago

A user in stackoverflow, https://stackoverflow.com/questions/78544921/gocql-allow-filtering-returns-empty-records-with-page-size reported a bug when using Scylla and gocql, and scanning an entire partition with filtering (ALLOW FILTERING): If an entire page of results did not match the filter, then Scylla returns an empty page and gocql incorrectly stops the iteration immediately instead of continuing for the next page. His report has example code he used. I have not verified this report myself (I never used gocql myself).

If this report is true, I suspect it is a gocql bug, not a Scylla bug. We used to have this bug in the Python driver but it was fixed years ago (see https://github.com/scylladb/scylladb/issues/8203, and the fix https://github.com/datastax/python-driver/commit/1d9077d3f4c937929acc14f45c7693e76dde39a9), and in the Java driver (https://github.com/apache/cassandra-java-driver/pull/1544) and now I suspect we have the same bug in gocql.

We have a test in Scylla's test suite, test/cql-pytest/test_filtering.py::test_filtering_contiguous_nonmatching_partition_range and test_filtering_contiguous_nonmatching_single_partition, using the fixed Python driver, which checks exactly this situation: a filtered scan of a table or partition with a large number of rows, only the very last of them actually matching the filter, so the first pages of results are empty. This is why I'm guessing the bug is in gocql, not in Scylla itself. By the way, if you want you can translate one of these tests into Go and see if it fails with gocql and later use it as a gocql unit test.

By the way, after we solved this bug in the Python and Java drivers, we started to rely more and more on sending back empty pages. For example, we now send back an empty page also if we find ourselves scanning a very long list of tombstones, to avoid timeouts. Our drivers need to handle this case correctly.

mykaul commented 3 months ago

@sylwiaszunejko - did we reproduce this? Do we have an existing test in other drivers (Python and friends) that we can 'translate' it?

sylwiaszunejko commented 3 months ago

@mykaul I haven't started working on this issue yet, but I see that there is a test for empty pages in python-driver added https://github.com/datastax/python-driver/commit/1d9077d3f4c937929acc14f45c7693e76dde39a9

moksh-samespace commented 3 months ago

Any update on the above issue?

dkropachev commented 3 months ago

Update:

  1. Implemented test for it in https://github.com/scylladb/gocql/pull/212
  2. Behavior that is mentioned on stackoverflow is not observed
  3. But I have found what could have probably confused the user, which is iterator can produce empty result, which is expected behavior in a sence that since it is ALLOW FILTERING query, every shard executes it and if there is no such record it returns empty result, which driver pass down to the user.

Quick fix for user: update iteration logic to stop iterating only when len(iter.PageState()) == 0

On our side we can probably find a way to automatically retry on iterators with no results, or just mention the fact of posibility of having empty results.

I have quickly investigated posibility of automatically retry on empty iterators: In order to do that we will need to somehow pass iter.PageState() to a Query and reexecute it. Since it is manual pagination case, it should be done only by user and doing implicitly will confuse code and probably user, so I strongly believe only way to properly handly is to mention posibility of having iterators with no results in the documentation.

nyh commented 3 months ago

Is the iterator in the Go API for pages or for individual rows? If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.

dkropachev commented 3 months ago

Is the iterator in the Go API for pages or for individual rows? If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.

I have given it my best shot here, please take a look. It still can produces empty iterator, but only for last page. Otherwise it makes sure that Query.Iter() does not give away empty result.

dkropachev commented 2 months ago

Fixed in https://github.com/scylladb/gocql/pull/212

sylwiaszunejko commented 2 months ago

@nyh Did you see @dkropachev PR? Are you satisfied with the outcome?

roydahan commented 2 months ago

@dkropachev please check if we want to add some documentation about previous behavior vs new. Then, we can close this one.