walkerke / tigris

Download and use Census TIGER/Line shapefiles in R
Other
324 stars 45 forks source link

Downloads failing, but work with browser user agent #118

Closed rushgeo closed 2 years ago

rushgeo commented 3 years ago

I'm having intermittent problems downloading through tigris. Sometimes all three download attempts fail, and other times they succeed. When they fail, the output file in the cache directory will either be zero bytes, or a very short HTML error:

<HTML><HEAD><TITLE>Error</TITLE></HEAD><BODY>
An error occurred while processing your request.<p>
Reference&#32;&#35;97&#46;3f4a0760&#46;1620408093&#46;140630ae
</BODY></HTML>

Inspired by the discussion here, I added a browser user agent to the downloads. Specifically, I added: user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0") in every GET() call in tigris:::load_tiger

This seems to work every time, but I suppose I can't be 100% certain the user agent is doing the trick when there is still intermittent success without the patch.

Still, I wonder if it's worth either:

  1. following up with someone at the Census to ask about if their CDN or policies could be impacting downloads, or
  2. using a user agent in tigris either all of the time, or after the first failed download attempt.
walkerke commented 3 years ago

We've gotten a number of error reports about this in the past week; my best guess is that the Census website is undergoing some maintenance or is having some issues. @loganpowell - do you have any thoughts on @rushgeo's suggestion?

loganpowell commented 3 years ago

Hi friends. If you're making a lot of calls to any Census address, there's a default policy that will block your IP. If you've been able to make successful gets and then - all of a sudden - are getting errors and then aren't able to get successfully after receiving the error the first time, this is probably happening to you. I have to do heavy pulls using wget sometimes. In order to do so, I usually try to do it from a "throw-away" IP address (via VPN) and do everything in one sitting. Our Akamai caching layer will institute the block after some unknown time (within hours).

rushgeo commented 3 years ago

This doesn't sound like the scenario I'm experiencing. I'm having this happen from my first attempt on a new machine, and I'm also having intermittent success after previously having errors on another machine.

loganpowell commented 3 years ago

In that case, it's unrelated to the issue referenced. What are the addresses tigris accesses?

rushgeo commented 3 years ago

I've mostly been downloading tracts, which for 2010 come from https://www2.census.gov/geo/tiger/TIGER2010/TRACT/2010/ if the cartographic boundary files aren't requested instead. The code that builds the URL is here.

loganpowell commented 3 years ago

Sorry for the delayed response. Are you still experiencing this issue?

profLuna commented 3 years ago

Not sure if this is the same problem, but I have recently had trouble downloading county subdivisions. The following fails:

ma_towns_sf <- county_subdivisions(state = "MA", cb = TRUE)

I get the following message:

Using FIPS code '25' for state 'MA' error 1 in extracting from zip fileCannot open layer cb_2019_25_cousub_500k Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : Opening layer failed.

No problem accessing states. Just county subdivisions and smaller geographies, and sometimes it works. Using tigris version 1.4

walkerke commented 3 years ago

@profLuna I just tested - it is working for me on my local version of R. I've also tested on my server version of R which took a little while to connect to the Census website but is working too. Are you running a server version of R? Downloads seem to fail more frequently there. I'd also always recommend using options(tigris_use_cache = TRUE) to build a local cache rather than relying on data downloads.

profLuna commented 3 years ago

@walkerke Thanks for the quick response. I am running a local version of R. Tried doing with and without a VPN, but same response. Definitely will set local cache to TRUE, although I'm stuck at the moment. Still weird because states and tracts work without a problem. It just seems to be county_subdivisions.

ricobert1 commented 3 years ago

Hi, I can confirm this same behavior and the issue is ongoing.

Specifically, the link specified, for instance, by block_groups is valid for downloading when pasted into a browser. However, from the R environment it fails to download.

walkerke commented 3 years ago

This one's a little tricky to test as I can't reproduce the error; however I'm wondering if heavy use of tigris temporarily clogs certain datasets on the Census website. For example, if I run:

> httr:::default_ua()
[1] "libcurl/7.58.0 r-curl/4.3.1 httr/1.4.2"

It's possible then that many R users are sending the same user agent to the Census website and intermittently blocking it, given that this user agent will be identical across tigris users with those versions. I'll do some more research on this.

loganpowell commented 3 years ago

Can you email our admin, @.***, about this?

Give her as much detail as possible

On Thu, May 27, 2021, 6:56 AM Kyle Walker @.***> wrote:

This one's a little tricky to test as I can't reproduce the error; however I'm wondering if heavy use of tigris temporarily clogs certain datasets on the Census website. For example, if I run:

httr:::default_ua() [1] "libcurl/7.58.0 r-curl/4.3.1 httr/1.4.2"

It's possible then that many R users are sending the same user agent to the Census website and intermittently blocking it, given that this user agent will be identical across tigris users with those versions. I'll do some more research on this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/walkerke/tigris/issues/118#issuecomment-849536923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2ACPYUAXN274ER6VSDYHTTPYQNXANCNFSM44KUR72Q .

jzadra commented 3 years ago

I am having the same issue. I can get states and block groups, but zctas fail:

zctas()
Previous download failed.  Re-download attempt 1 of 3...
Previous download failed.  Re-download attempt 2 of 3...
Previous download failed.  Re-download attempt 3 of 3...
Error: Download failed; check your internet connection or the status of the Census Bureau website
                 at http://www2.census.gov/geo/tiger/.

It's been several months since I used tigris. At first I got the following:

ZCTAs can take several minutes to download.  To cache the data and avoid re-downloading in future R sessions, set `options(tigris_use_cache = TRUE)`
Error: Cannot open "/private/var/folders/5_/l71sk6kn29z17n011g8kld5m0000gp/T/Rtmp8guaZD"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
In addition: Warning message:
In unzip(file_loc, exdir = tmp) : error 1 in extracting from zip file

I then removed tigris and reinstalled from github, and now get the download error.

EDIT:

I tried to get zctas again just a minute after posting this, and it worked.

pdeshlab commented 3 years ago

I am dealing with the same zctas error mentioned above:

zctas <- tigris::zctas()  

# error: Download failed; check your internet connection or the status of the Census Bureau website
Previous download failed.  Re-download attempt 1 of 3...
Previous download failed.  Re-download attempt 2 of 3...
Previous download failed.  Re-download attempt 3 of 3...
Error: Download failed; check your internet connection or the status of the Census Bureau website
                 at http://www2.census.gov/geo/tiger/.

I've been experiencing it for about 24 hours, but am not sure if it takes more time for someone to be unblocked if they've made multiple requests. Like jzadra pointed out, zctas seems to be the only geometry affected by this error, but again, I'm not sure if that's because it is the geometry I've been querying most frequently.

walkerke commented 3 years ago

I just ran zctas() successfully. I would strongly recommend using shapefile caching with options(tigris_use_cache = TRUE) if you are frequently requesting ZCTAs. This will store the shapefile on your computer and use the local cache instead of downloading from the Census website each time and risking this issue.

pdeshlab commented 3 years ago

I'm definitely going to use options(tigris_use_cache = TRUE) in the future, but unfortunately, I didn't use that option when I was first scripting. Do you happen to know how long it usually takes for the issue to resolve itself?