kevkanaan / PureSphere

Data engineering project
Apache License 2.0
1 stars 0 forks source link

Store data in the right zone #7

Closed JorickPepin closed 10 months ago

JorickPepin commented 10 months ago

For the moment, we store the data directly in the /data folder but they should be stored in the correct zone. The zones are defined in this section of the README.

Ref: https://www.riccardotommasini.com/courses/2023-10-02-dataeng-insa-ot/#projects-2023

Tom-DELAPORTE commented 10 months ago

See here for fix -> https://github.com/kevkanaan/PureSphere/tree/ingestion_pipeline

Briefly, I moved data from the root of the repository to dags folders so that it's already within Airflow's volume. I created the architecture (landing/staging/production). I adapted ingestion scripts accordingly (appart from water-api.py for the moment).

JorickPepin commented 10 months ago

Thanks for the work. Wouldn't it be better to keep the data in a separate folder and add this folder to the container volumes?