Closed JorickPepin closed 10 months ago
See here for fix -> https://github.com/kevkanaan/PureSphere/tree/ingestion_pipeline
Briefly, I moved data from the root of the repository to dags folders so that it's already within Airflow's volume. I created the architecture (landing/staging/production). I adapted ingestion scripts accordingly (appart from water-api.py for the moment).
Thanks for the work. Wouldn't it be better to keep the data in a separate folder and add this folder to the container volumes?
For the moment, we store the data directly in the
/data
folder but they should be stored in the correct zone. The zones are defined in this section of the README.Ref: https://www.riccardotommasini.com/courses/2023-10-02-dataeng-insa-ot/#projects-2023