Closed tiernanmartin closed 6 years ago
I'm changing my approach (slightly).
I will keep the code inside make_acs_data()
but will add a step that uploads the data to osf.io and then downloads it, thereby ensuring that the steps are captured within the drake
plan.
This pattern makes sense for external datasets that are likely to change during the course of a project. For instance, it is very likely that I might want to add a new ACS table to the project at some point, so this pattern is helpful; however, it is unlikely that I would want to update the water bodies spatial data, so that script can stay in extdata/R
-- outside the drake
plan.
There is a problem with this implementation: users who do not have access to the OSF.io project will not be able to run a command that uploads data to the project.
The solution that I'd like to try includes the following steps:
make_target <- function(has_osf_persmission){
if(!has_osf_persmission){return(NULL)}
}
This check for OSF accesss could also be incorporated as a trigger.
The external data plan will be split into four separate plans:
ext_data_prep_plan
ext_data_upload_plan
ext_data_download_plan
ext_data_ready_plan
While all four plans will be available to any user for inspection, the first two plans are only intended to be run and/or modified by the project creator. The third and fourth plans will be the starting point for other users who are reproducing the project.
After implementing the drake plan structure I realized that there is a problem: 1st and 2nd plans are isolated from the 3rd and 4th plans (by design), so drake
doesn't know that the the 1st and 2nd should be run before the 3rd and 4th.
I want the 1st and 2nd plans to be independent from the rest of the project's drake plan because these plans should only be run/modified by the project manager.
I think it is worthwhile to split the external data plan into two:
data_source_plan
obj_prep_status
and obj_upload_status
targetsdata_cache_plan
obj_filepath
and obj
targetsThe first -- data_source_plan
-- would only be run when a data source change occurs and it would be run separately from the rest of the project's drake
plan.
Here is a diagram of the idea behind this different architecture for the data plans:
It might be helpful to put the functions that make this work into their own package (at some point).
All data should be stored in the project's osf.io project page.
Shift the
make_acs_data()
script toextdata/R/
and create a new version that downloads the data from osf.io.