Closed mdsumner closed 6 years ago
Half done (https://github.com/ropensci/bowerbird/commit/b81f726fd48f93db0d1bffef52fd1eeca81c99f1). Slightly problematic because the data files come from a different server than the index page (the data source's source_url
), which means that the downloaded zip files don't get automatically unzipped. WIP.
OK, have implemented a minimal wget-like crawler in R, which means that the sync process now knows all about which files it's downloading and where they are being saved (rather than leaving that to wget). So this should now work (small example):
remotes::install_github("ropensci/bowerbird", ref = "usbuildings")
library(bowerbird)
datadir <- tempdir()
cf <- bb_config(datadir) %>% bb_add(bb_source_us_buildings(states = "District of Columbia"))
res <- bb_sync(cf, verbose = TRUE, confirm_downloads_larger_than = NULL)
With the downloaded files unpacked locally:
dir(file.path(datadir, dirname(res$files[[1]]$file)))
[1] "DistrictofColumbia.json" "DistrictofColumbia.zip"
Awesome!
This looks like it will be an important benchmark data set, so a good one for example config:
https://github.com/Microsoft/USBuildingFootprints