Open exalate-issue-sync[bot] opened 1 year ago
Michal Kurka commented: Thank you for reporting!
JIRA Issue Migration Info
Jira Issue: PUBDEV-5821 Assignee: New H2O Bugs Reporter: Gregory Kanevsky State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
As stands today H2O uses SQL SELECT with OFFSET/LIMIT logic (or equivalent - depending on database) that doesn't guarantee consistency between consecutive calls. And even more so in distributed environment with multiple processes/nodes connecting to database in parallel to ingest (import) data from a table by dividing it in chunks. This equally applies to PostgreSQL, Teradata and other databases.
Outside of implementing a SQL CURSOR (not feasible) there is an option of adding a new parameter (
key
ororder by
?) to order rows that guarantees such consistency when diving table rows in chunks. SQLORDER BY
clause with SELECT and OFFSET/LIMIT logic would have to be applied in accordance with the logic implemented for each database.New parameter could be simply a character string containing one or more column names separated by comma to use with
ORDER BY
. For backward compatibility make it optional and roll back to current implementation when it is missing. It'll be user responsibility to use a key (one or more columns) that uniquely order table rows. Using such parameter (correctly) will likely affect performance but guarantee correctness.