xmunoz / sodapy

Python client for the Socrata Open Data API
MIT License
402 stars 114 forks source link

SoQL query support for get_all() #82

Closed kuirolo closed 2 years ago

kuirolo commented 2 years ago

While I was trying to submit a SoQL query using the get_all() method, it kept throwing an error:

HTTPError: 400 Client Error: Bad Request.
        If $query is used, all options - [$offset] should not be specified in $query.

The official documentation isn't super detailed It looks like the problem is that all options must be in the SoQL query if you use a SoQL query.

My code using get_all is below. But you can reproduce this error with the get() method by including a query parameter with any other SoQL clause.

I came up with a functional solution, but wasn't sure how to update the test or documentation.

However, I thought I would open an issue before making a pull request. For most cases, filtering with the other API parameters is ok. Perhaps the simplest solution is changing the get_all() docstring to note SoQL queries aren't supported.

import json
from sodapy import Socrata

client_params = {'domain': 'health.data.ny.gov',
                'app_token': None,
                'timeout': 10} 

request_params = {'dataset_identifier': 'vn5v-hh5r',
                  'content_type': 'json',
                  'query': """SELECT facility_name, fac_desc_short
                              WHERE fac_desc_short like '%HOSP%'"""}

with Socrata(**client_params) as client:
    response = client.get_all(**request_params)

# Iterating over the generator throws HTTPError error
resp_list = [r for r in response]

# If it actually works, put data here
with open('query_test.json', 'w') as f:
    json.dump(resp_list, f, indent=4)
xmunoz commented 2 years ago

Thanks for opening this issue! You are absolutely correct, get_all does not support SoQL queries, but if you look at the get implementation and tests, you can see how an example of how to get that working.

kuirolo commented 2 years ago

Of course! Thanks for maintaining this package, it's the foundation of some important code for my work.

Ok, I'll look at get again.

In the meantime, should I submit a pull request for my version of get_all that supports SoQL queries? Sorry, I should have been more clear in my initial message.

My "functional solution" is an expanded version of get_all that can paginate using either $limit and $offset or a $query string. The trade off is its significantly more complex than the current implementation, so I understand if its not something you want to support in the main branch.

xmunoz commented 2 years ago

Right now, for all of the endpoints, pagination is up to the user. I think that I would like to preserve this consistency. If you want to add some docs about how to do pagination using $limit and $offset, I wouldn't be opposed to that.