ropensci / bowerbird

Keep a collection of sparkly data resources
https://docs.ropensci.org/bowerbird
Other
47 stars 6 forks source link

US building footprints #21

Closed mdsumner closed 6 years ago

mdsumner commented 6 years ago

This looks like it will be an important benchmark data set, so a good one for example config:

https://github.com/Microsoft/USBuildingFootprints

raymondben commented 6 years ago

Half done (https://github.com/ropensci/bowerbird/commit/b81f726fd48f93db0d1bffef52fd1eeca81c99f1). Slightly problematic because the data files come from a different server than the index page (the data source's source_url), which means that the downloaded zip files don't get automatically unzipped. WIP.

raymondben commented 6 years ago

OK, have implemented a minimal wget-like crawler in R, which means that the sync process now knows all about which files it's downloading and where they are being saved (rather than leaving that to wget). So this should now work (small example):

remotes::install_github("ropensci/bowerbird", ref = "usbuildings")
library(bowerbird)
datadir <- tempdir()
cf <- bb_config(datadir) %>% bb_add(bb_source_us_buildings(states = "District of Columbia"))
res <- bb_sync(cf, verbose = TRUE, confirm_downloads_larger_than = NULL)

With the downloaded files unpacked locally:

dir(file.path(datadir, dirname(res$files[[1]]$file)))
[1] "DistrictofColumbia.json" "DistrictofColumbia.zip"
mdsumner commented 6 years ago

Awesome!