thegraphnetwork / EpiGraphHub

Software platform to Gather, transmform, harmonize and store epidemiological data for analytical purposes.
https://epigraphhub.org
GNU General Public License v3.0
8 stars 10 forks source link

fix(sinan-dag): change pysus data path #169

Closed luabida closed 1 year ago

luabida commented 1 year ago

Although the problem of PySUS data not being uploaded into Postgres is not related to the data path, the motive for changing from /tmp/pysus to $HOME/pysus is because the time the DAG takes to finish. How the workflow is exactly the same to each disease, therefore the usage of .expand method, the different amount of years in each disease and the size of some dataframes, accessing each table to extract the total rows would create a huge memory overhead and increase the time for the DAG. Another issue is that pangres.upsert method is set to if_row_exists="update", Sandro has told me that some rows are lately updated on SINAN FTP server and the table could have the same amount of rows, but different values in the tables, and that's why all data is downloaded and after deleted monthly.

Requires https://github.com/thegraphnetwork/epigraphhub_py/pull/209 fix to be released

luabida commented 1 year ago

a friendly reminder about this PR waiting to be merged