Closed davidpeckham closed 3 years ago
Any feedback on this?
Need additional research on the approach. The use of a static list of files is not helpful. If a dynamic list can not be generated without. elevated privileges then we will better document how to acquire the files.
Chris, I think we should merge this PR now to make it easier for developers to get started with the project. We can always improve it later.
We need to think about making it easier for developers to get started with the project, and reliably run the pipeline. The Jupyter notebooks are a good way to prototype the pipeline, but they're long, complex, poorly documented, and prone to mistakes. If people are going to trust the results, we've got to make it more repeatable and reliable.
I added a new pipeline step to download the dataset and hard-coded the list of raw files. We could easily move the file list to a separate config file and roll this into the existing import step.