xmunoz / sodapy

Python client for the Socrata Open Data API
MIT License
402 stars 114 forks source link

get_all() has inconsistent pagination #79

Closed patrickleclair-GORDONFN closed 3 years ago

patrickleclair-GORDONFN commented 3 years ago

After running:

results = client.get_all("y8as-bmzj")
with open("coc test.csv", "a", newline="", encoding="utf-8") as line_file:
    csv_writer = DictWriter(line_file, fieldnames=["id","sample_site","sample_date",
                                                   "parameter","numeric_result","result_qualifier",
                                                   "formatted_result","result_units","latitude_degrees","longitude_degrees","site_key"])
    for row in data:
        csv_writer.writerow(row)      

I was receiving different final files each time. Sometimes these would include duplicate rows, other times I would have rows missing. I reran my code using pagination over the OData API instead with no issues. From my debugging the only thing I could figure was that the get_all() function wasn't returning the data correctly

xmunoz commented 3 years ago

Thanks for the bug report! Taking a look now...

xmunoz commented 3 years ago

Which domain are you connecting to? Can you provide the client setup code so that I can try to reproduce the issue?

xmunoz commented 3 years ago

I googled the dataset identifier and it led me to this site (which currently appears to be down): https://data.calgary.ca/d/y8as-bmzj

Screenshot from 2021-02-20 17-24-29

patrickleclair-GORDONFN commented 3 years ago

The code I used was:

client = Socrata("data.calgary.ca",app_token="")
results = client.get_all("y8as-bmzj")
xmunoz commented 3 years ago

Alright, I'm able to query this domain now. Let me try to reproduce. Something else I did notice though is that above you put the response in results, but then iterate over an undefined data variable. Could this be causing the bug?

xmunoz commented 3 years ago

I'm sorry, I wasn't able to reproduce the problem that you're having. Here is the code that I used.

from sodapy import Socrata
from csv import DictWriter

client = Socrata("data.calgary.ca",app_token="")
results = client.get_all("y8as-bmzj")
with open("coc_test.csv", "w", newline="", encoding="utf-8") as line_file:
    csv_writer = DictWriter(line_file, fieldnames=["id","sample_site","sample_date",
                                                   "parameter","numeric_result","result_qualifier",
                                                   "formatted_result","result_units","latitude_degrees","longitude_degrees","site_key"])
    for row in results:
        csv_writer.writerow(row)

After this finished running, it correctly produced a file with 293,039 unique rows.

$ uniq -u coc_test.csv | wc -l
293039

I'm going to close this issue. Please feel free to re-open if I missed something.