Closed tonofshell closed 2 years ago
Dev.socrata.com says to provide an order clause to ensure stable results. https://dev.socrata.com/docs/paging.html
Implementing their "at a minimum" recommendation on get_all might be the best fix.
Heads Up! The order of the results of a query are not implicitly ordered, so if you're paging, make sure you provide an $order clause or at a minimum $order=:id. That will guarantee that the order of your results will be stable as you page through the dataset.
I would be happy to review and merge a pull request to address this :)
When using
get_all()
on large datasets, the results are not paginated correctly. The returned response has the correct total amount of rows but approximately 10% of the rows are duplicates of other rows. If the API call does not explicitly order rows, there is no guarantee that each page of results is a unique chunk of the total rows in the dataset. This could be resolved by creating an API call withlimit
greater than or equal to the total number of rows in the dataset.