wpinvestigative / arcos

https://wpinvestigative.github.io/arcos/
Other
30 stars 18 forks source link

Data format changed and API is pulling delimited file and not a json #17

Open mkiang opened 2 years ago

mkiang commented 2 years ago

This is functionally the same issue as #11 and #15; however, I wanted to go into more depth. I haven't tested this thoroughly, but hopefully it'll be enough for the devs to address.

If I use the default county_raw() function, I get:

> county_raw(county = "Mingo", state = "WV", key = "WaPo")
Sorry, our server must be overwhelmed. Try again in a little bit. If it persists, let us know at https://github.com/wpinvestigative/arcos/issues
Sorry, our server must be overwhelmed. Try again in a little bit. If it persists, let us know at https://github.com/wpinvestigative/arcos/issues
Sorry, our server must be overwhelmed. Try again in a little bit. If it persists, let us know at https://github.com/wpinvestigative/arcos/issues

<snipped>

The error suggests this is a server response issue, but when I create the URL by hand, I get a proper tab-separated file. For example, the URL equivalent of the call above is: https://arcos-api.ext.nile.works/v1/county_data?county=MINGO&state=WV&key=WaPo. If I open it and read the first 5 lines (as a tibble), I get:

> read_delim("https://arcos-api.ext.nile.works/v1/county_data?county=MINGO&state=WV&key=WaPo") %>% slice(1:5)
Rows: 37154 Columns: 42                                                                                           
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (33): REPORTER_DEA_NO, REPORTER_BUS_ACT, REPORTER_NAME, REPORTER_ADDL_CO_INFO, REPORTER_ADDRESS1, REPORTER_A...
dbl  (9): REPORTER_ZIP, BUYER_ZIP, DRUG_CODE, QUANTITY, CALC_BASE_WT_IN_GM, DOSAGE_UNIT, TRANSACTION_ID, MME_Con...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 5 × 42
  REPORTER_DEA_NO REPORTER_BUS_ACT REPORTER_NAME   REPORTER_ADDL_CO… REPORTER_ADDRES… REPORTER_ADDRES… REPORTER_CITY
  <chr>           <chr>            <chr>           <chr>             <chr>            <chr>            <chr>        
1 PH0035964       DISTRIBUTOR      H D SMITH WHOL… null              4650 INDUSTRIAL… null             SPRINGFIELD  
2 PH0035964       DISTRIBUTOR      H D SMITH WHOL… null              4650 INDUSTRIAL… null             SPRINGFIELD  
3 PH0035964       DISTRIBUTOR      H D SMITH WHOL… null              4650 INDUSTRIAL… null             SPRINGFIELD  
4 PH0035964       DISTRIBUTOR      H D SMITH WHOL… null              4650 INDUSTRIAL… null             SPRINGFIELD  
5 PH0035964       DISTRIBUTOR      H D SMITH WHOL… null              4650 INDUSTRIAL… null             SPRINGFIELD  
# … with 35 more variables: REPORTER_STATE <chr>, REPORTER_ZIP <dbl>, REPORTER_COUNTY <chr>, BUYER_DEA_NO <chr>,
#   BUYER_BUS_ACT <chr>, BUYER_NAME <chr>, BUYER_ADDL_CO_INFO <chr>, BUYER_ADDRESS1 <chr>, BUYER_ADDRESS2 <chr>,
#   BUYER_CITY <chr>, BUYER_STATE <chr>, BUYER_ZIP <dbl>, BUYER_COUNTY <chr>, TRANSACTION_CODE <chr>,
#   DRUG_CODE <dbl>, NDC_NO <chr>, DRUG_NAME <chr>, QUANTITY <dbl>, UNIT <chr>, ACTION_INDICATOR <chr>,
#   ORDER_FORM_NO <chr>, CORRECTION_NO <chr>, STRENGTH <chr>, TRANSACTION_DATE <chr>, CALC_BASE_WT_IN_GM <dbl>,
#   DOSAGE_UNIT <dbl>, TRANSACTION_ID <dbl>, Product_Name <chr>, Ingredient_Name <chr>, Measure <chr>,
#   MME_Conversion_Factor <dbl>, Combined_Labeler_Name <chr>, Revised_Company_Name <chr>, Reporter_family <chr>, …
Warning message:
One or more parsing issues, see `problems()` for details 

Which clearly suggests the server is working, responding, and providing data in a parsable format. I think the issue is that the function expects it as a JSON rather than as a tsv? For example, https://github.com/wpinvestigative/arcos/blob/master/R/raw.R#L45 looks like it expects a JSON using fromJSON from the jsonlite package but when I call it directly, I get a parsing error:

> jsonlite::fromJSON("https://arcos-api.ext.nile.works/v1/county_data?county=MINGO&state=WV&key=WaPo")
Error in parse_con(txt, bigint_as_char) : 
  lexical error: invalid char in json text.
                                       REPORTER_DEA_NO  REPORTER_BUS_AC
                     (right here) ------^
mkiang commented 2 years ago

Also, this may just be as simple a fix as changing the API call to return a JSON rather than leaving it blank and it defaulting to a tsv but I have no idea...