ropensci / patentsview

An R client to the PatentsView API
https://docs.ropensci.org/patentsview
Other
31 stars 9 forks source link

Handling the api's 400 and 500 errors for the locations endpoint #11

Closed mustberuss closed 6 years ago

mustberuss commented 6 years ago

Possible ways to workaround the underlying api issue https://github.com/CSSIP-AIR/PatentsView-API/issues/24

1) Field validation is done in the api's executeQuery before the database is queried. It throws the 400 error since cpc_sequence is not present in entitySpecs for the location endpoint. It's present for 3 other endpoints (patents, assignees, inventors) and other cpc fields are present for the location endpoint. Until this is resolved cpc_sequence could be temporarily removed from fieldsdf here for the locations endpoint or return a custom error message if the field is specified in a locations query. get_fields("locations") should not return it. No locations query containing it can do anything but receive a 400 error if the api is called. The rest of the fieldsdf fields are present so no other 400 errors would be thrown by the api. Perhaps a PR in PatentsView-API is in order to correct this though it's the lesser of the two issues.

2) as suggested by @crew102 in the above issue, react to a 500 error being thrown by the api and return a helpful error message if one or more troublesome field is present. I had initially thought that unmapping the fields would be the fastest/easiest fix but that was before I figured out the how many troublesome fields there are (identified in the above issue).

* as a side or potentially separate issue, cpc_sequence is not specified on the assignees or inventors endpoint web pages but can be returned on a query to those endpoints. I'll try scraping the api's entitySpecs and comparing it to fieldsdf.csv to see if there are more undocumented fields on other endpoints. get_fields() may be under-reporting!

mustberuss commented 6 years ago

As promised, I compared the api's entitySpecs to fieldsdf.csv and did find some undocumented fields but I am even more surprised to find an undocumented endpoint! I'll add it to my swagger definition shortly. probably as an undocumented endpoint. I'm not sure how best to handle any of this in the R package. Perhaps as a separate issue?!

Gets and posts can be made to /api/cpc_groups/query ex: http://www.patentsview.org/api/cpc_groups/query?q={%22patent_type%22:%22Utility%22} There are 168 available fields, the 165 fields cpc_subsections has with the addition of year_num_patents_for_cpc_group, inventor_num_patents_for_cpc_group and assignee_num_patents_for_cpc_group

Attached are the fields my script identified as being in the api's entitySpecs but not in fieldsdf.csv and presumably not on the api's endpoint web pages.

addme.txt

crew102 commented 6 years ago
  1. It throws the 400 error since cpc_sequence is not present in entitySpecs for the location endpoint.

Can you open up an issue at https://github.com/CSSIP-AIR that tells them about this? I didn't know that cpc_sequence was the issue behind the 400 errors for the locations endpoint when I opened up the original issue.

  1. Until this is resolved cpc_sequence could be temporarily removed from fieldsdf here for the locations endpoint or return a custom error message if the field is specified in a locations query. get_fields("locations") should not return it.

OK, we can remove the line in fieldsdf that states that cpc_sequence is a queryable field for the locations endpoint. Once this change is made, get_fields("locations") will not return cpc_sequence. Do you want to update fieldsdf.r so that it excludes this line, re-run fieldsdf.r so it saves a new fieldsdf, then open up a PR with these changes? If not I will do so.

  1. The rest of the fieldsdf fields are present so no other 400 errors would be thrown by the api.

Right, but we are still getting 500 errors from the locations endpoint, correct?

  1. I figured out the how many troublesome fields there are (identified in the above issue).

What groups are the troublesome fields in? We should tell the user in the custom error message that certain groups are troublesome (i.e., forget about specifying all the actual fields, let's just note the groups).

  1. as a side or potentially separate issue, cpc_sequence is not specified on the assignees or inventors endpoint web pages but can be returned on a query to those endpoints.

Can you open up a separate issue for this at https://github.com/CSSIP-AIR

  1. but I am even more surprised to find an undocumented endpoint!

So you think that there is an cpc_groups endpoint? Perhaps this is a dev endpoint that they are working on. The patentsview r package shouldn't allow the users to hit this endpoint until it is stable.

  1. Attached are the fields my script identified as being in the api's entitySpecs but not in fieldsdf.csv and presumably not on the api's endpoint web pages.

OK, so these are fields that the API supposedly supports that are not documented in the various fields lists given in the APIs online documentation (e.g., the patents field list . Can you open up a separate issue for this at https://github.com/CSSIP-AIR stating that some fields are not in the fields list tables?

mustberuss commented 6 years ago

Can you open up an issue at https://github.com/CSSIP-AIR that tells them about this? I didn't know that cpc_sequence was the issue behind the 400 errors for the locations endpoint when I opened up the original issue.

I could but they don't seem very responsive. I'll look at their entity object and will try a PR. It would be trivial to allow cpc_sequence on the location endpoint. It might be possible to tell what is wrong with the other troublesome fields since at least some work on other endpoints. I posted to the api's forum asking what the preferred method is to report bugs and I linked to the underlying 400/500 issue. Anyone can start a thread but the replies are moderated so no reply yet. http://www.patentsview.org/community/forum/7/topic/85

Right, but we are still getting 500 errors from the locations endpoint, correct?

Correct but I'm hoping they're just misconfigurations in the entity object.

So you think that there is an cpc_groups endpoint?

-> POST /api/cpc_groups/query HTTP/1.1
-> Host: www.patentsview.org
-> User-Agent: https://github.com/ropensci/patentsview
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Length: 2860
-> 
>> {"q":{"patent_number":"5116621"},"f":["assignee_county","assignee_county_fips","assignee_state_fips","examiner_first_name","examiner_id","examiner_key_id","examiner_last_name","examiner_role","examiner_group","inventor_county","inventor_county_fips","inventor_state_fips","lawyer_first_name","lawyer_first_seen_date","lawyer_id","lawyer_key_id","lawyer_last_name","lawyer_last_seen_date","lawyer_organization","lawyer_sequence","lawyer_total_num_patents","lawyer_total_num_assignees","lawyer_total_num_inventors","pct_doctype","pct_date","pct_102_date","pct_371_date","pct_kind","pct_docnumber","detail_desc_length","app_country","app_date","app_id","app_number","app_type","assignee_city","assignee_country","assignee_key_id","assignee_latitude","assignee_location_id","assignee_longitude","assignee_num_patents_for_cpc_group","assignee_state","govint_raw_statement","govint_org_id","govint_org_name","govint_org_level_one","govint_org_level_two","govint_org_level_three","govint_contract_award_number","inventor_city","inventor_country","inventor_first_name","inventor_first_seen_date","inventor_id","inventor_key_id","inventor_last_name","inventor_lastknown_city","inventor_lastknown_country","inventor_lastknown_latitude","inventor_lastknown_location_id","inventor_lastknown_longitude","inventor_lastknown_state","inventor_last_seen_date","inventor_latitude","inventor_location_id","inventor_longitude","inventor_num_patents_for_cpc_group","inventor_state","inventor_total_num_patents","inventor_total_num_assignees","ipc_action_date","ipc_class","ipc_classification_data_source","ipc_classification_value","ipc_first_seen_date","ipc_last_seen_date","ipc_main_group","ipc_section","ipc_subclass","ipc_subgroup","ipc_symbol_position","ipc_total_num_assignees","ipc_total_num_inventors","ipc_version_indicator","patent_abstract","patent_date","patent_firstnamed_assignee_id","patent_firstnamed_assignee_city","patent_firstnamed_assignee_country","patent_firstnamed_assignee_latitude","patent_firstnamed_assignee_location_id","patent_firstnamed_assignee_longitude","patent_firstnamed_assignee_state","patent_firstnamed_inventor_id","patent_firstnamed_inventor_city","patent_firstnamed_inventor_country","patent_firstnamed_inventor_latitude","patent_firstnamed_inventor_location_id","patent_firstnamed_inventor_longitude","patent_firstnamed_inventor_state","patent_id","patent_kind","patent_num_citations","patent_num_cited_by_us_patents","patent_num_combined_citations","patent_num_foreign_citations","patent_num_us_application_citations","patent_num_us_patent_citations","patent_num_claims","patent_number","patent_title","patent_type","year_id","year_num_patents_for_cpc_group","wipo_field_id","wipo_field_title","wipo_sector_title","wipo_sequence"],"o":{"include_subentity_total_counts":false,"matched_subentities_only":true,"page":1,"per_page":25},"s":{}}

<- HTTP/1.1 200 OK

What groups are the troublesome fields in?

Here are the counts by group: (16) assignees (15) cpcs (10) nbers (2) rawinventors (11) uspcs

crew102 commented 6 years ago

Closed by #12