thegraphnetwork / EpiGraphHub

Software platform to Gather, transmform, harmonize and store epidemiological data for analytical purposes.
https://epigraphhub.org
GNU General Public License v3.0
8 stars 10 forks source link

[BUG][DAG] SINAN upload finished without uploading any data #167

Closed luabida closed 1 year ago

luabida commented 1 year ago

Describe the bug

[2023-02-02, 15:00:28 UTC] {{taskinstance.py:1415}} INFO - Marking task as SUCCESS. dag_id=brasil_sinan, task_id=upload, execution_date=20230202T165235, start_date=20230202T200028, end_date=20230202T200028
[2023-02-02, 15:00:28 UTC] {{taskinstance.py:2356}} DEBUG - Task Duration set to 0.514046
[2023-02-02, 15:00:28 UTC] {{cli_action_loggers.py:84}} DEBUG - Calling callbacks: []
[2023-02-02, 15:00:28 UTC] {{local_task_job.py:156}} INFO - Task exited with return code 0

According to the logs, the data has been downloaded successfully, but upload task took less than one second and the data wasn't uploaded into Postgres

To Reproduce

https://airflow.epigraphhub.org/log?dag_id=brasil_sinan&task_id=upload&execution_date=2023-02-02T16%3A52%3A35.527228%2B00%3A00

Expected behavior

Upserting all .parquet directories at /tmp/pysus into EGH postgres DB

Screenshots

image

Desktop

No response

Smartphone

No response

Additional context

No response

fccoelho commented 1 year ago

One suggestion: add a step in the dag, with a simple select count in the database table after the uploading, to compare it to the previous size of the table + the length of the parquet.