thegraphnetwork / EpiGraphHub

Software platform to Gather, transmform, harmonize and store epidemiological data for analytical purposes.
https://epigraphhub.org
GNU General Public License v3.0
8 stars 10 forks source link

fix(sinan): refactor sinan dag #178

Closed luabida closed 1 year ago

luabida commented 1 year ago
luabida commented 1 year ago

@fccoelho do you know how could we surround this error?

[2023-03-24, 21:44:02 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR04.parquet inserted into db [2023-03-24, 21:44:03 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR05.parquet inserted into db [2023-03-24, 21:44:04 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR06.parquet inserted into db [2023-03-24, 21:44:05 UTC] {{taskinstance.py:1775}} ERROR - Task failed with exception Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context self.dialect.do_executemany( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany context._psycopg2_fetched_rows = xtras.execute_values( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values cur.execute(b''.join(parts)) psycopg2.errors.UndefinedColumn: column "tp_not" of relation "sinan_malaria_m" does not exist LINE 1: INSERT INTO brasil.sinan_malaria_m (index, tp_not, id_agravo... ^

fccoelho commented 1 year ago

@fccoelho do you know how could we surround this error?

[2023-03-24, 21:44:02 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR04.parquet inserted into db [2023-03-24, 21:44:03 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR05.parquet inserted into db [2023-03-24, 21:44:04 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR06.parquet inserted into db [2023-03-24, 21:44:05 UTC] {{taskinstance.py:1775}} ERROR - Task failed with exception Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context self.dialect.do_executemany( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany context._psycopg2_fetched_rows = xtras.execute_values( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values cur.execute(b''.join(parts)) psycopg2.errors.UndefinedColumn: column "tp_not" of relation "sinan_malaria_m" does not exist LINE 1: INSERT INTO brasil.sinan_malaria_m (index, tp_not, id_agravo... ^

This appears to result from a non-existing column, that is, on some specific year they added a new field on the forms. I am not sure if this is what you are looking for, but I would do something like this:

from psycopg2.errors import UndefinedColumn
try: 
    # do the inserts
except UndefinedColumn as e:
    # extract the name of the missing columns and table from the exception using regular expressions
    # do an alter table <table name> add column <col name>
    # retry the inserts
luabida commented 1 year ago

Running final tests

luabida commented 1 year ago

How it will work:

If any task get an empty list, it will skip itself image image

luabida commented 1 year ago

@fccoelho requires https://github.com/AlertaDengue/PySUS/pull/123 to be released

luabida commented 1 year ago

Added a Dag to drop a SINAN table to be triggered manually:

image image

luabida commented 1 year ago

There's nothing more generic than Exception which is the base class of all Exceptions in Python ;-) Indeed, UndefinedColumn is a postgresql error code and not an Exception you can raise, apparently. But according to the psycopg2 docs, you can do something like this:


try:
    cur.execute("your sql  insert code")
except psycopg2.errors.lookup("42703"): #this is the UndefinedColumn code
    # do the alter table

I'm gonna run some tests with this preset, but maybe if I add an else: raise e, any other exceptions would be raised, no?

fccoelho commented 1 year ago

There's nothing more generic than Exception which is the base class of all Exceptions in Python ;-) Indeed, UndefinedColumn is a postgresql error code and not an Exception you can raise, apparently. But according to the psycopg2 docs, you can do something like this:

try:
    cur.execute("your sql  insert code")
except psycopg2.errors.lookup("42703"): #this is the UndefinedColumn code
    # do the alter table

I'm gonna run some tests with this preset, but maybe if I add an else: raise e, any other exceptions would be raised, no?

The point then becomes: what kind of other exceptions can be expected here? You can also write something like this:

try:
    #do something
except SpecificException1 as e:
    # do something
except SpecificException2 as e:
    # do something else
except Exception as e:
    raise e

In this case you at least have the opportunity to catch specific exceptions you know may happen. The last except block may not even be needed because if the exception is not one of two previous ones, it will raise anyway.

luabida commented 1 year ago

@fccoelho ready for review and merge. Friendly reminder about merging this PR too: https://github.com/AlertaDengue/PySUS/pull/123