healthyregions / oepsData

An R package for easy access to the Opioid Environment Policy Scan (OEPS) datasets.
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Explore remote load of geometry files during `load_data` function #4

Closed mradamcox closed 2 months ago

mradamcox commented 4 months ago

A good way around storing large geometry files directly in this package (like tract boundaries that can be >100mb), would be to pull them from remote sources when users run a load data command. This command would get geometries, join to the OEPS data to them, and return a dataframe that is ready for spatial analysis.

We have a data pipeline in place that merges census boundaries into single files of many different formats, and that pipeline could be augmented to deliver whatever type of spatial format that would work best for R. If we went this route, what should we make our datasets look like?

bucketteOfIvy commented 4 months ago

Remote load is now working! It turns out that sf supports remote load through GDAL's virtual file systems. As such, code like

counties2010 <- st_read('/vscurl/https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/geometryFiles/county/counties2010.shp')

successfully runs.

We currently only have public facing links on S3 for cartographic boundaries, so I've set up the load_oeps function to pull from the opioid-policy-scan GitHub for the time being. Once we do have non-cartographic boundaries setup on S3, we can edit the links pointed to by the retrieve_geometry function in load_oeps.R to change the source.