marcelo-g-simas / CTPPr

R package for loading and working with the US Census CTPP survey data.
14 stars 7 forks source link

Can't get tract to tract flows #1

Open aelissa opened 5 years ago

aelissa commented 5 years ago

Hi, I am trying to download Tract to Tract flows by means of transportation calling the api with your download_ctpp function.

download_ctpp( id = "A302103", geography = "Tract->Tract", state = "New York, New Jersey", dataset = "2016", output = "FIPS Code" )

It throws the following error:

Error in .checkTypos(e, names(x)) : Object 'Output' not found amongst <table border="0" cellpadding="0" cellspacing="0" style="width:100%;height:100%">, V2, V3, Estimate In addition: Warning messages: 1: In data.table::fread(raw_text, skip = 2, nrows = (line_count - 6), : Detected 1 column names but the data has 4 columns (i.e. invalid file). Added 3 extra default column names at the end. 2: In data.table::fread(raw_text, skip = 2, nrows = (line_count - 6), : Stopped early on line 60. Expected 4 fields but found 0. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<>>

I went through your function and found that the error is due to a bad response to my request. In the response I can read: ErrorMessage.aspx?ErrMsg=DownloadFailed&PerspectiveLanguage=en&PerspectiveUserId=&PerspectivePassword=&ErrorDetail=Exception+of+type+%27System.OutOfMemoryException%27+was+thrown.

I also had a look at my request body. Might the error be related to these limits (which, btw, are the same of the web interface): BackgroundDownloadSupported:True BackgroundDownloadColumnLimit:10000 BackgroundDownloadCellLimit:25000000

Would it be ok(/enough) if I increase such limits?

Many thanks

AnthonyFucci commented 5 years ago

Hi aelissa, can you confirm that the output you expect is all available Tract->Tract estimates for A302103 within New York and within New Jersey?

aelissa commented 5 years ago

Hi, Yes that is exactly what I am after. Actually, I would even be interested in getting more coverage. I am currently working with the data downloaded with the ftp service but accessing the api would surely be more convenient and faster.

AnthonyFucci commented 5 years ago

I looked into this and unfortunately am not sure requests like this are a good idea because of how large in size these Flow tables can get and how the web requests work. As an example, Wyoming has the fewest number of Census Tracts with 132, and table A302103 has 19 unique categories per tract->tract relationship.

132*132*19 = 331,056 rows
331,056*4  = 1,324,224 cells

Or about 16 megabytes using FIPS Code output. And that's just for one state with the fewest number of tracts. It's impressive how large this number can get for a state like New York with close to 5,000 tracts.

My first thought is to add a new parameter to download_ctpp that lets us subset requests by county, which makes a lot of sense for these small geography Flow tables, but that would require a bit of work. For now I think we have to subset the FTP file or make smaller files with the web tool that can be joined together.