Open nmtarr opened 7 years ago
I'm inclined to suggest that either 1) only required data needed to reproduce this analysis be subsetted from large datasets and provided locally in a github data folder and referenced directly and/or 2) code is added to remotely download all required data when absent from the local data folder. For the latter case, relevant data would be downloaded only once and then referenced locally. Remote in memory processing may be possible but it appears the data files are rather large.
Any manual operations required by humans such as spreadsheet manipulation should be eliminated if possible.
You might wish to automate script operations (i.e. start the code suite and it runs start to finish and produces the correct result).
@TWellman I agree that these are all issues that limit the reproducibility. I discussed this analysis a few months ago with Alexa, Sky, and Steve A. and these issues came up and the consensus was that we’ll handle this analysis in two phases. The first phase is the one presented here, with processing performed locally, hitting local versions of data because it involves nearly all of the GAP data sets. The next phase will be to place all the data and processing elsewhere. That will require someone to develop ways to conduct the analyses without ArcGIS and despite the enormous file sizes, as well as new instances of the GAP databases.
All spreadsheet manipulations are automated, except the part where experts (Alexa and Anne) identify which systems fit the criteria and that can’t be automated.
In the current phase, automating script operations isn’t feasible due to the runtimes.
Does this analysis package meet goals for reproducibility?