healthyregions / oepsData

An R package for easy access to the Opioid Environment Policy Scan (OEPS) datasets.
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Timing out when accessing tract data #16

Open Makosak opened 1 month ago

Makosak commented 1 month ago
> tracts19 <- load_oeps(scale="tract", 
+                          year=2018,
+                          geometry=TRUE)
trying URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables//T_Latest.csv'
Error in download.file(file, tmpFile, method = method, mode = "wb", quiet = !showProgress) : 
  cannot open URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables//T_Latest.csv'
In addition: Warning message:
In download.file(file, tmpFile, method = method, mode = "wb", quiet = !showProgress) :
  URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables/T_Latest.csv': Timeout of 60 seconds was reached

> tracts19 <- load_oeps(scale="tract", 
+                          year=2018,
+                          geometry=FALSE)
trying URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables//T_Latest.csv'
Error in download.file(file, tmpFile, method = method, mode = "wb", quiet = !showProgress) : 
  cannot open URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables//T_Latest.csv'
In addition: Warning message:
In download.file(file, tmpFile, method = method, mode = "wb", quiet = !showProgress) :
  URL 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables/T_Latest.csv': Timeout of 60 seconds was reached
> 
Makosak commented 1 month ago

Both of these options work on their own:

tracts19.read <- read.csv("https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables/T_Latest.csv")

library(data.table)
tracts19.fread <- fread("https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables/T_Latest.csv")
bucketteOfIvy commented 1 month ago

On the back end, the package makes the fread call above, which makes this behavior very interesting and weird. As that load_oeps call also works on my system (albeit very slowly), my gut instinct is that this is a networking hiccup of some sort. Does the load_oeps call still fail on your end if you retry now (or when you try on different connections)?

Either way, there is currently no back end logic to help catch and provide useful feedback/error handling of timeouts and networking hiccups in general, so that may end up being a welcome addition to the package (especially if this does turn out to be one).

mradamcox commented 1 month ago

One strange thing is double slashes near the end of the URL. Not sure it's related but we should get that fixed.

mradamcox commented 1 month ago

Looks like fread can also target gzipped files (docs), so compressing each CSV is something else we could do to reduce the chance of this timeout happening in the future.

mradamcox commented 1 month ago

Interestingly, this is happening in the docs build as well:

Error in `download.file()`:
! download from 'https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/full_tables//T_2000.csv' failed

https://github.com/healthyregions/oepsData/actions/runs/11466984292/job/31909157660

mradamcox commented 1 month ago

The docs build job completes successfully after I removed the extra slash, for what it's worth. @Makosak can you try this again when you get a chance?