Closed luabida closed 1 year ago
@fccoelho do you know how could we surround this error?
[2023-03-24, 21:44:02 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR04.parquet inserted into db [2023-03-24, 21:44:03 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR05.parquet inserted into db [2023-03-24, 21:44:04 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR06.parquet inserted into db [2023-03-24, 21:44:05 UTC] {{taskinstance.py:1775}} ERROR - Task failed with exception Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context self.dialect.do_executemany( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany context._psycopg2_fetched_rows = xtras.execute_values( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values cur.execute(b''.join(parts)) psycopg2.errors.UndefinedColumn: column "tp_not" of relation "sinan_malaria_m" does not exist LINE 1: INSERT INTO brasil.sinan_malaria_m (index, tp_not, id_agravo... ^
@fccoelho do you know how could we surround this error?
[2023-03-24, 21:44:02 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR04.parquet inserted into db [2023-03-24, 21:44:03 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR05.parquet inserted into db [2023-03-24, 21:44:04 UTC] {{sinan.py:192}} INFO - /tmp/pysus/MALABR06.parquet inserted into db [2023-03-24, 21:44:05 UTC] {{taskinstance.py:1775}} ERROR - Task failed with exception Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context self.dialect.do_executemany( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany context._psycopg2_fetched_rows = xtras.execute_values( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values cur.execute(b''.join(parts)) psycopg2.errors.UndefinedColumn: column "tp_not" of relation "sinan_malaria_m" does not exist LINE 1: INSERT INTO brasil.sinan_malaria_m (index, tp_not, id_agravo... ^
This appears to result from a non-existing column, that is, on some specific year they added a new field on the forms. I am not sure if this is what you are looking for, but I would do something like this:
from psycopg2.errors import UndefinedColumn
try:
# do the inserts
except UndefinedColumn as e:
# extract the name of the missing columns and table from the exception using regular expressions
# do an alter table <table name> add column <col name>
# retry the inserts
Running final tests
How it will work:
brasil.sinan_update_ctl
. The table will look like this:
get_updates
will be responsible for comparing the existent years found in DB with the control table and return a dict telling which parquet should be firstly inserted, if it was passed to finals or if it is a preliminary year. This result can be seem at the returned XCom data:
first_insertion
will be responsible for adding in the inserted dataframe the information about the year of the parquet and if it is prelim or not. These new columns will be important for the next task runs, that will read the prelim
and year
columns to decide if they rather should be updated or not.prelims_to_finals
will check if any prelim dbc was passed to finals, this check is the result of a comparison between the FTP path and the prelim
column in control DB. It will replace the prelim true
in control and db tablesupdate_prelims
will receive the preliminary years to update, drop the rows from that year and re-insert the data, counting the rows to return the number of new rows for that year
If any task get an empty list, it will skip itself
@fccoelho requires https://github.com/AlertaDengue/PySUS/pull/123 to be released
Added a Dag to drop a SINAN table to be triggered manually:
There's nothing more generic than
Exception
which is the base class of all Exceptions in Python ;-) Indeed,UndefinedColumn
is a postgresql error code and not an Exception you can raise, apparently. But according to the psycopg2 docs, you can do something like this:try: cur.execute("your sql insert code") except psycopg2.errors.lookup("42703"): #this is the UndefinedColumn code # do the alter table
I'm gonna run some tests with this preset, but maybe if I add an else: raise e
, any other exceptions would be raised, no?
There's nothing more generic than
Exception
which is the base class of all Exceptions in Python ;-) Indeed,UndefinedColumn
is a postgresql error code and not an Exception you can raise, apparently. But according to the psycopg2 docs, you can do something like this:try: cur.execute("your sql insert code") except psycopg2.errors.lookup("42703"): #this is the UndefinedColumn code # do the alter table
I'm gonna run some tests with this preset, but maybe if I add an
else: raise e
, any other exceptions would be raised, no?
The point then becomes: what kind of other exceptions can be expected here? You can also write something like this:
try:
#do something
except SpecificException1 as e:
# do something
except SpecificException2 as e:
# do something else
except Exception as e:
raise e
In this case you at least have the opportunity to catch specific exceptions you know may happen. The last except block may not even be needed because if the exception is not one of two previous ones, it will raise anyway.
@fccoelho ready for review and merge. Friendly reminder about merging this PR too: https://github.com/AlertaDengue/PySUS/pull/123