Closed DentaCool closed 1 month ago
When performing parallel reading of queries, there can be instances where data no longer matches the filter criteria. For example, consider a situation where the initial count query returns 100 records with id values ranging from 1 to 101. Before fetching the partition [90..100], the age of the record with id 99 changes from 25 to 26. If the filter condition is WHERE age < 26
, this record will no longer match the filter, resulting in df zeroed values like (0, 0, 0, 0).
It also seems that with an increased amount of data on parallel queries, zeros are also possible (I could not check)
WHERE age > 23 and age < 26
count - 20 range - [1...101]
56 id changed from 23 to 24 before own query part
actual_count = 21
In general, I have a problem where I get zeros with parallel reading of database data where there are frequent changes. In most cases, the database can only increase the amount of data for the specified filter
Same issue
Hi @DentaCool , thanks for reporting the issue and the reproducible example! This fix will be included in our next release.
What language are you using?
Python.
What version are you using?
0.3.3/0.3.2
What database are you using?
PostgreSQL
What dataframe are you using?
Pandas
Can you describe your bug?
When performing parallel data reading with ordering in queries, the results are incorrect. Specifically, using DESC ordering on the id column causes the output to contain zeroed data.
What are the steps to reproduce the behavior?
Execute a query with ordering by any column, such as DESC ordering on id.
Database setup if the error only happens on specific data or data type
Table schema and example data
Postgres logs: