hrecht / censusapi

R package to retrieve U.S. Census data and metadata via API
https://www.hrecht.com/censusapi/
169 stars 31 forks source link

International Trade: Error in apiParse(req) #47

Closed MBlackmanEIA closed 5 years ago

MBlackmanEIA commented 5 years ago

Just an FYI that I'm not a professional programmer/researcher and this is my first time using R to pull Census data.

I'm trying to workaround the fact that the Census API isn't designed for pulling full datasets. One of the macro teams at the DOE is interested in collecting and consolidating international trade data by HS and NAICS from the Census.

Linked is one kind of dataset I was trying to pull from: https://api.census.gov/data/timeseries/intltrade/exports/hs

I talked briefly to a Census supervisor for macro analysis and read through the API guide. Then, I tried to set up an API call in R using the censusapi pkg. Got the following:

Error in apiParse(req) : The Census Bureau returned the following error message: There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience.

What I was trying to do was limit the API call to only the export parameters by string and a handful of int type parameters that fall under a certain category like air, shipping, etc. The Census supervisor advised that I merge calls and filter out the summary lines by including the following: SUMMARY_LVL2=HSCYCSDTRP, COMM_LVL=HS10, and SUMMARY_LVL=DET

I'm not sure if the issue is I'm calling too many variables at once or something else.

Below is what I'm doing based off the tutorial:

x<-c("censusapi", "data.table")
require(x)
lapply(x, require, character.only = TRUE)
apis <- listCensusApis()
View(apis)

example <- listCensusMetadata(name = "timeseries/intltrade/exports/hs", type = "variables")
head(example)

HS.Mon.Exports <- getCensus(name = "timeseries/intltrade/exports/hs",
vars = c("AIR_VAL_MO", "AIR_VAL_YR",    "AIR_WGT_MO", "AIR_WGT_YR", "SUMMARY_LVL=DET",
"QTY_1_YR_FLAG",    "DIST_NAME",    "YEAR", "CTY_NAME", "COMM_LVL=HS10", 
"E_COMMODITY_SDESC",     "DF",  "MONTH", "CTY_CODE",    "LAST_UPDATE", "DISTRICT",
"E_COMMODITY_LDESC",    "QTY_1_MO_FLAG",    "SUMMARY_LVL2=HSCYCSDTRP",  "E_COMMODITY",
"UNIT_QY2", "UNIT_QY1", "QTY_2_MO_FLAG", "QTY_2_YR_FLAG",
"E_COMMODITY","UNIT_QY1"), region = "us:*") 
head(HS.Mon.Exports)

Let me know if what I'm saying is confusing or unclear. Thank you for your time!

hrecht commented 5 years ago

That error message is coming from the Census Bureau, which is just getting passed in to R - unfortunately they're not giving more details about what's going wrong. (I get that same error when their servers are down, but also for random bugs.) I'm getting that error on even single variable calls, e.g. https://api.census.gov/data/timeseries/intltrade/exports/hs?get=DIST_NAME&for=us:*

I'd recommend sending an email to cnmp.developers.list@census.gov with that raw API call.

MBlackmanEIA commented 5 years ago

Good morning,

Is there a way to pull up the actual url that the library configures for users?

I got in contact with someone today and they need that url to track down the root of the error. I tried poking around with the debug tool in R, but didn't have any luck finding it myself.

Any advice for next steps would be greatly appreciated.

MBlackmanEIA commented 5 years ago

I think I found it.

Browse[1]> req[["url"]] [1] "https://api.census.gov/data/timeseries/intltrade/exports/hs?key=mykey&get=AIR_VAL_MO%2CAIR_VAL_YR%2CAIR_WGT_MO%2CAIR_WGT_YR%2CSUMMARY_LVL%3DDET%2CQTY_1_YR_FLAG%2CDIST_NAME%2CYEAR%2CCTY_NAME%2CCOMM_LVL%3DHS10%2CE_COMMODITY_SDESC%2CDF%2CMONTH%2CCTY_CODE%2CLAST_UPDATE%2CDISTRICT%2CE_COMMODITY_LDESC%2CQTY_1_MO_FLAG%2CSUMMARY_LVL2%3DHSCYCSDTRP%2CE_COMMODITY%2CUNIT_QY2%2CUNIT_QY1%2CQTY_2_MO_FLAG%2CQTY_2_YR_FLAG%2CE_COMMODITY%2CUNIT_QY1&for=us%3A%2A"

hrecht commented 5 years ago

Yep, that will be what Census needs to debug.

MBlackmanEIA commented 5 years ago

Good morning,

So, I've been spinning my wheels a bit. I was told there were several possible issues with the request and I haven't made much progress.

For #1, region = "us:" configures the url to "for=us:" which is not applicable for the endpoint. But, since region seems required for censusapi I'm not sure how to workaround because I was interested in national level data for imports and exports.

For #2, I was told "time" is a required predicate. I played around a bit with it and didn't have any luck making a valid API call. This is the example I was given which I tried to replicate via censusapi:

https://api.census.gov/data/timeseries/intltrade/exports/hs?get=DIST_NAME,CTY_NAME,E_COMMODITY_SDESC,ALL_VAL_YR&time=2013-01&CTY_CODE=1220&key=YOUR_KEY_GOES_HERE

3 I was told the request is large. Is there a way to split up the query with a wildcard like "*" or an alternative method? I tried narrowing it down by reducing the variables and specifying CTY_CODE to other countries besides Canada (which corresponds to 1220 in the example call) with no luck.

Also, I didn't know if censusapi produces the configured url from the example timeseries calls just to compare how my call differs and where things might be incorrect? Those work great.

hrecht commented 5 years ago

Ah, okay - for the region issue, that's a known problem and is a priority for the next version of censusapi - see https://github.com/hrecht/censusapi/issues/38

time is already available as an argument, and is used across timeseries API endpoints. For the request being too large, that's something that is happening on the Census Bureau end and not via the censusapi package. I don't have particular advice besides requesting less variables at once.

MBlackmanEIA commented 5 years ago

Thank you for the explanation!

hrecht commented 5 years ago

Hi @MBlackmanEIA - thanks for the patience, this should all be possible in the new version of censusapi, v0.6. It isn't available on CRAN yet, but can be installed from Github with:

# install.packages("devtools")
devtools::install_github("hrecht/censusapi")

Here's an example from the discussion above:

getCensus(
    name = "timeseries/intltrade/exports/hs",
    vars = c("DIST_NAME", "CTY_NAME", "E_COMMODITY_SDESC", "ALL_VAL_YR"),
    time = 2013,
    CTY_CODE = 1220)

I don't have much experience with these APIs, but anything that works in the APIs should now work with this R package (hopefully!) It would be great to have more people testing this before the CRAN release - if you get a chance, let me know if you run in to any issues with this new version or if it's working out.