mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.08k stars 102 forks source link

Fix duplicated system stats if you run multiple ETLs in parallel #38

Closed jankatins closed 4 years ago

jankatins commented 4 years ago

If one is running two or more ETLs at the same time and these runs produce a system statistics at the exact same millisecond, we would fail one of these runs with a failure to add the statistic to the DB.

Now:

Replaces https://github.com/mara/data-integration/pull/29 Closes: https://github.com/mara/data-integration/issues/22 #29

martin-loetzsch commented 4 years ago

It works:

make migrate-mara-db 
migrate-mara-db: FLASK_APP=app/app.py .venv/bin/flask mara_db.migrate
migrate-mara-db: ALTER TABLE data_integration_system_statistics ADD COLUMN run_id INTEGER DEFAULT -1 NOT NULL;
migrate-mara-db: 
migrate-mara-db: 2 seconds