xmunoz / sodapy

Python client for the Socrata Open Data API
MIT License
402 stars 114 forks source link

Query to return all dataset identifiers from an endpoint? #46

Closed axfelix closed 6 years ago

axfelix commented 6 years ago

Hi,

I'm not sure if I'm missing something, but can this library perform a query to return all dataset identifiers from a given endpoint, so that I can then loop through them and perform get requests?

I could obviously just make a request to https://data.edmonton.ca/api/catalog/v1?domains=data.edmonton.ca and parse the JSON manually to get the resource ID from every top-level object, but it seems like there should be an easier way...

xmunoz commented 6 years ago

Ask and you shall receive :)

axfelix commented 6 years ago

Thanks!!

axfelix commented 6 years ago

So, I don't think this is an issue with your client so much as my not understanding how Socrata does federation between endpoints, but I'm noticing some pretty weird output; if I request the 800th dataset from https://data.edmonton.ca like so, I get a bunch of New York open data results:

>>> client = Socrata("data.edmonton.ca", None)
>>> client.datasets(limit=800)[799]

{'permalink': 'https://data.ny.gov/d/5bb2-yb85', 'owner': {'display_name': 'NY Open Data', 'id': 'xzik-pf59'}, 'link': 'https://data.ny.gov/Government-Finance/Income-Tax-Components-by-Size-of-Income-by-Place-o/5bb2-yb85', 'metadata': {'domain': 'data.ny.gov'}, 'resource': {'page_views': {'page_views_total_log': 15.499285338999742, 'page_views_last_week_log': 2.584962500721156, 'page_views_last_week': 5, 'page_views_last_month_log': 5.554588851677638, 'page_views_total': 46317, 'page_views_last_month': 46}, 'columns_format': [{'align': 'left'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'align': 'left'}, {'align': 'left'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'precisionStyle': 'standard', 'precision': '0', 'align': 'right', 'noCommas': 'false'}, {'align': 'left'}, {'align': 'left'}, {'align': 'left'}, {'precisionStyle': 'standard', 'align': 'center', 'noCommas': 'true'}, {'align': 'left'}], 'columns_datatype': ['Text', 'Number', 'Number', 'Number', 'Number', 'Number', 'Number', 'Number', 'Text', 'Text', 'Number', 'Number', 'Text', 'Text', 'Text', 'Number', 'Text'], 'name': 'Income Tax Components by Size of Income by Place of Residence: Beginning Tax Year 1999', 'provenance': 'official', 'parent_fxf': None, 'columns_field_name': ['country', 'taxable_income_of_all_returns_in_thousands', 'income_class_sort_order', 'deductions_of_all_returns_in_thousands', 'dependent_exemptions_of_all_returns_in_thousands', 'place_of_residence_sort_order', 'number_of_all_returns', 'ny_agi_of_all_returns_in_thousands', 'county', 'disclosure', 'tax_liability_of_all_returns_in_thousands', 'tax_before_credits_of_all_returns_in_thousands', 'state', 'income_class', 'resident_type', 'tax_year', 'place_of_residence'], 'type': 'dataset', 'updatedAt': '2018-08-07T21:54:25.000Z', 'description': 'The Department of Taxation and Finance annually produces a data (study) file and provides a report of statistical information on New York State personal income tax returns that were timely filed. Timely filing means that the tax return was delivered to the Department on or before the due date of the tax return. The data are from full-year resident, full-year nonresident, and part-year resident returns. This dataset defines individuals filing a resident tax return as full-year residents and individuals filing a nonresident tax return are defined as either a full- year nonresident or a part-year resident.Data presented in this dataset provide the major income tax structure components by size of income. The components include income, deductions, dependent exemptions, and tax liability. The data also provides this information by size of income and by the filerÔÇÖs permanent place of residence (county, state or country). For a more detailed explanation on the determinationof residency and components of income see the attachment: NYSTF_PlaceOfResidence_Introduction.Researchers agree to: Use the data for statistical reporting an analysis only. The author will include a disclaimer that states any analyses, interpretations or conclusions were reached by the author and not the New York State Department of Taxation and Finance.', 'columns_name': ['Country', 'Taxable Income of All Returns (in thousands)', 'Income Class Sort Order', 'Deductions of All Returns (in thousands)', 'Dependent Exemptions of All Returns (in thousands)','Place of Residence Sort Order', 'Number of All Returns', 'NY AGI of All Returns (in thousands) *', 'County', 'Disclosure', 'Tax Liability of All Returns (in thousands) *', 'Tax Before Credits of All Returns (in thousands)', 'State', 'Income Class', 'Resident Type', 'Tax Year', 'Place of Residence'], 'attribution': 'New York State Department of Taxation and Finance', 'download_count': 1576, 'columns_description': ['Name of Country. \n+++ Includes other foreign countries; \n++++ Includes unclassified and individuals filing a nonresident tax return but containing a New York address', 'Value of subtracting allowable deductions and exemptions from New York Adjusted Gross Income from return filings, and multiplying the remainder by the appropriate New York State tax rate schedule', 'Sort Order on Income Class', 'Deductions from return filings', 'Value of New York exemption of $1,000 for each dependent claimed on the taxpayerÔÇÖs federal income tax return', 'Sort Order on Place of Residence', 'Count of the number of return filings (note: married filing joint returns count as one)', 'New York Adjusted GrossIncome from return filings. \n New York Adjusted Gross Income on resident taxforms and Federal source New York Adjusted Gross Income (includes non-New York income) on non-resident tax forms', 'Name of State. \n+ Resident returns that could not be classified by county;\n++ Includes resident tax returns with an out-of-state address', 'Identifies whether the data in the next 7 columns have a value, but is not reported.\nd/ - Tax Law secrecy provisions prohibit the disclosure of the data', 'Tax Liability from return filings\n** Includes refundable tax credits', 'Tax Liability from return filings based on New York taxable income before subtraction of allowable credits.', 'Name of State', 'New York Adjusted Gross Income ranges. Note: the income ranges changed between tax years 2006 and 2007. Note: If income ranges do not appear, the values in that range are all zeros.', 'Type of resident: Full-Year Resident, Full-Year Nonresident, Part-Year Nonresident', 'Tax Year', 'Name of the New York State County, State or\nCountry of Residence.\n\nNote: for Full-Year Nonresidents, the data may not be consistently available.\n\n+ Resident returns that could not be classified by county;\n\n++ Includes resident tax returns with an out- of-state address;\n\n+++ Includes other foreign countries;\n\n++++ Includes unclassified and individuals filing a nonresident tax return but\ncontaining a New York address\n'], 'id': '5bb2-yb85', 'createdAt': '2014-04-17T15:08:38.000Z'}, 'classification': {'domain_category': 'Government & Finance', 'domain_tags': ['tax', 'income', 'liability', 'county'], 'domain_metadata': [{'value': 'opendata@its.ny.gov', 'key': 'Common-Core_Contact-Email'}, {'value': 'Open Data NY', 'key': 'Common-Core_Contact-Name'}, {'value': 'State of New York', 'key': 'Common-Core_Publisher'}, {'value': 'http://www.tax.ny.gov/forms/income_cur_forms.htm', 'key': 'Additional-Resources_See-Also-'}, {'value': 'http://www.tax.ny.gov/research/stats/statistics/collect_policy_stat_reports.htm', 'key': 'Additional-Resources_See-Also'}, {'value': 'New York State Department of Taxation and Finance', 'key': 'Dataset-Summary_Dataset-Owner'}, {'value': 'OTPA.OpenNYData@tax.ny.gov', 'key': 'Dataset-Summary_Contact-Information'}, {'value': 'Place of Residence', 'key': 'Dataset-Summary_Granularity'}, {'value': 'Statewide', 'key': 'Dataset-Summary_Coverage'}, {'value': 'Annually', 'key': 'Dataset-Summary_Data-Frequency'}, {'value': 'Static - Not Updated', 'key': 'Dataset-Summary_Posting-Frequency'}, {'value': 'Office of Tax Policy Analysis','key': 'Dataset-Summary_Organization'}, {'value': 'Beginning Tax Year 1999 and forward', 'key': 'Dataset-Summary_Time-Period'}, {'value': 'county', 'key': 'Local-Data_County_Column'}, {'value': 'Yes', 'key': 'Local-Data_County-Filter'}, {'value': 'Taxation and Finance, Department of', 'key': 'Dataset-Information_Agency'}, {'value': 'This dataset includes timely filed taxpayers.', 'key': 'Disclaimers_Limitations'}], 'tags': [], 'categories': []}}

I'm assuming it's a misconfiguration on their end and I'll bring it up there, I just wanted to run it by you as there doesn't seem to be any way I can achieve that same output from any of the filters on https://data.edmonton.ca/browse.

axfelix commented 6 years ago

I also get an HTTP 400 bad request if I don't set any limit, but again, I'm happy to report this to the instance if you can't duplicate.

xmunoz commented 6 years ago

Yeah, it was hard for me to find documentation about this endpoint. Do you know of any? Maybe @chrismetcalf can help us out here.

-------- Original Message -------- On Sep 4, 2018, 10:55 AM, axfelix wrote:

I also get an HTTP 400 bad request if I don't set any limit, but again, I'm happy to report this to the instance if you can't duplicate.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

xmunoz commented 6 years ago

Ok, I think I've figured out how this endpoint works. I'll update the code accordingly.

axfelix commented 6 years ago

Worked! Thanks.

axfelix commented 6 years ago

Is there any chance that Socrata really didn't like this method? It worked for about a month and now it's throwing 400 on every Socrata site I test against. Should I try to contact them upstream rather than bothering you?

xmunoz commented 5 years ago

That may be prudent. Since this endpoint is undocumented, they probably don't expect that people are calling it.

xmunoz commented 5 years ago

Also, just to follow up, the endpoint is working for me. Does it work for you now? Was it perhaps a transient availability issue?

axfelix commented 5 years ago

from sodapy import Socrata socratarepo = Socrata("data.calgary.ca", "") WARNING:root:Requests made without an app_token will be subject to strict throttling limits. socratarepo.datasets() Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\chocolatey\lib\python3\tools\lib\site-packages\sodapy-1.4.6-py3.5.egg\sodapy__init.py", line 131, in datasets File "C:\ProgramData\chocolatey\lib\python3\tools\lib\site-packages\sodapy-1.4.6-py3.5.egg\sodapy\init.py", line 406, in _perform_request File "C:\ProgramData\chocolatey\lib\python3\tools\lib\site-packages\sodapy-1.4.6-py3.5.egg\sodapy\init__.py", line 460, in _raise_for_status requests.exceptions.HTTPError: 400 Client Error: Bad Request

Still failing on all the sites I've tested it on for the past couple weeks, when it was working before. I'll try to contact their API support.

xmunoz commented 5 years ago
$ python
Python 3.6.4 (default, Mar  1 2018, 18:36:50)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sodapy import Socrata
>>> socratarepo = Socrata("data.calgary.ca", "")
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
>>> a = socratarepo.datasets()
>>> len(a)
607
>>> a[0]
{'resource': {'name': 'Community Crime Map', 'id': 'hhjd-wzc2', 'parent_fxf': ['kudt-f99k', '848s-4m4z'], 'description': 'Disorder events included are: Drunk, Disturbance, Indecent Act, Juvenile Complaint, Landlord/tenant, Mental health concern, Neighbor dispute, Party complaint, Prowler, Suspicious person, Threats, Drugs, Noise complaint, Possible gunshots, Unwanted guest/patron, Prostitution, Speeder, Suspicious Auto (grouped as Social Disorder),\xa0Fire, Property damage and Abandoned auto (grouped as Physical Disorder.  Crime count is based on the most serious violation (MSV) per incident.  Violence:  These figures include all violent crime offences as defined by the Centre for Canadian Justice Statistics Universal Crime Reporting (UCR) rules.  Domestic violence is excluded. Break and Enter:   Residential B&E includes both House and ‘Other’ structure break and enters due to the predominantly residential nature of this type of break in (e.g. detached garages, sheds).  B&Es incidents include attempts.\r\n**Resident counts for 2018 will be made available once 2018 census data is complete.**', 'attribution': 'The City of Calgary', 'type': 'map', 'updatedAt': '2018-10-15T17:41:48.000Z', 'createdAt': '2018-03-01T22:56:27.000Z', 'page_views': {'page_views_last_week': 1377, 'page_views_last_month': 5933, 'page_views_total': 33043, 'page_views_last_week_log': 10.428360172704291, 'page_views_last_month_log': 12.534789211480268, 'page_views_total_log': 15.01210071615157}, 'columns_name': [], 'columns_field_name': [], 'columns_datatype': [], 'columns_description': [], 'columns_format': [], 'download_count': 0, 'provenance': 'official'}, 'classification': {'categories': ['health', 'public safety', 'environment'], 'tags': [], 'domain_category': 'Health and Safety', 'domain_tags': [], 'domain_metadata': []}, 'metadata': {'domain': 'data.calgary.ca', 'license': 'See Terms of Use'}, 'permalink': 'https://data.calgary.ca/d/hhjd-wzc2', 'link': 'https://data.calgary.ca/Health-and-Safety/Community-Crime-Map/hhjd-wzc2', 'owner': {'id': '7gje-thpf', 'display_name': 'Gawor, Janusz'}}
xmunoz commented 5 years ago

Try upgrading your version of sodapy. I see that you're using 1.4.6.

axfelix commented 5 years ago

That did it! I'd installed from this repo manually when you initially got the functionality working, but maybe something got messed up for me locally. Sorry for the confusion.

alxfed commented 5 years ago

The .datasets(limit=...) method (without a limit it doesn't end) for an endpoint responds with ALL datasets for ALL endpoints (exactly as in the initial report of the issue).

I've just tried it on both data.cityofchicago.org and datacatalog.cookcountyil.gov endpoints; the response is the same (and what is described at the top of the issue).

P.S. This is a serious bug for everybody who works with multiple endpoints, don't pretend that it's unimportant.

xmunoz commented 5 years ago

I'm sorry that you're having trouble. Feel free to send along a pull request!