Closed luabida closed 1 year ago
2f84a99 fixes:
Traceback (most recent call last):
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1471, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1618, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1679, in _execute_task
result = execute_callable(context=context)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/decorators/base.py", line 179, in execute
return_value = super().execute(context)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 171, in execute
return_value = self.execute_callable()
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 189, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/brasil/sinan.py", line 77, in upload
raise e
File "/opt/airflow/dags/brasil/sinan.py", line 74, in upload
loading.upload(parquet_dirs)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 79, in upload
upsert_df_in_chunks(df)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 77, in upsert_df_in_chunks
raise e
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 55, in upsert_df_in_chunks
upsert(
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/core.py", line 302, in upsert
executor.execute(connectable=con, if_row_exists=if_row_exists, chunksize=chunksize)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/executor.py", line 87, in execute
pse.upsert(if_row_exists=if_row_exists, chunksize=chunksize)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/engine.py", line 551, in upsert
upq.execute(db_type=self._db_type, values=chunk, if_row_exists=if_row_exists)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/upsert_query.py", line 231, in execute
return self.connection.execute(query)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1380, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
ret = self._execute_context(
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
self._handle_dbapi_exception(
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2128, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
try:
ValueError: A string literal cannot contain NUL (0x00) characters.
Ready for review & merge.
@fccoelho I've reduced the PDFs into xlsx sheets, do you think is it ok to keep these files in the repo? I'm starting a module to extract these sheets into dataframes. Note that not every disease was included in the tar file that was sent to me
2023-02-23 15:16:54.801 | ERROR | __main__:metadata_df:49 - Metadata not available for Cancer
2023-02-23 15:16:57.439 | ERROR | __main__:metadata_df:49 - Metadata not available for Contact Communicable Disease
2023-02-23 15:16:57.439 | ERROR | __main__:metadata_df:49 - Metadata not available for Acidentes de Trabalho
2023-02-23 15:17:09.622 | ERROR | __main__:metadata_df:49 - Metadata not available for Poliomielite
2023-02-23 15:17:10.480 | ERROR | __main__:metadata_df:49 - Metadata not available for Sífilis Adquirida
2023-02-23 15:17:14.665 | ERROR | __main__:metadata_df:49 - Metadata not available for Violência Domestica
2023-02-23 15:17:14.666 | ERROR | __main__:metadata_df:49 - Metadata not available for Zika
Metadata columns comparative with Animais Peçonhentos
:
In [27]: for column in ANIM_parquet.columns:
...: if column not in metadata_dataframe.columns:
...: print(column)
TP_NOT
ID_AGRAVO
DT_NOTIFIC
SEM_NOT
NU_ANO
SG_UF_NOT
ID_MUNICIP
ID_REGIONA
DT_SIN_PRI
SEM_PRI
DT_NASC
NU_IDADE_N
CS_SEXO
CS_GESTANT
CS_RACA
CS_ESCOL_N
SG_UF
ID_MN_RESI
ID_RG_RESI
ID_PAIS
NU_AMPO_7
NU_AMPO_5
COM_COMPOR
DT_DIGITA
In [28]: list(ANIM_parquet.columns)
Out[28]:
['TP_NOT',
'ID_AGRAVO',
'DT_NOTIFIC',
'SEM_NOT',
'NU_ANO',
'SG_UF_NOT',
'ID_MUNICIP',
'ID_REGIONA',
'DT_SIN_PRI',
'SEM_PRI',
'DT_NASC',
'NU_IDADE_N',
'CS_SEXO',
'CS_GESTANT',
'CS_RACA',
'CS_ESCOL_N',
'SG_UF',
'ID_MN_RESI',
'ID_RG_RESI',
'ID_PAIS',
'DT_INVEST',
'ID_OCUPA_N',
'ANT_DT_ACI',
'ANT_UF',
'ANT_MUNIC_',
'ANT_LOCALI',
'ANT_ZONA',
'ANT_TEMPO_',
'ANT_LOCA_1',
'MCLI_LOCAL',
'CLI_DOR',
'CLI_EDEMA',
'CLI_EQUIMO',
'CLI_NECROS',
'CLI_LOCAL_',
'CLI_LOCA_1',
'MCLI_SIST',
'CLI_NEURO',
'CLI_HEMORR',
'CLI_VAGAIS',
'CLI_MIOLIT',
'CLI_RENAL',
'CLI_OUTR_2',
'CLI_OUTR_3',
'CLI_TEMPO_',
'TP_ACIDENT',
'ANI_TIPO_1',
'ANI_SERPEN',
'ANI_ARANHA',
'ANI_LAGART',
'TRA_CLASSI',
'CON_SOROTE',
'NU_AMPOLAS',
'NU_AMPOL_1',
'NU_AMPOL_8',
'NU_AMPOL_6',
'NU_AMPOL_4',
'NU_AMPO_7',
'NU_AMPO_5',
'NU_AMPOL_9',
'NU_AMPOL_3',
'COM_LOC',
'COM_SECUND',
'COM_NECROS',
'COM_COMPOR',
'COM_DEFICT',
'COM_APUTAC',
'COM_SISTEM',
'COM_RENAL',
'COM_EDEMA',
'COM_SEPTIC',
'COM_CHOQUE',
'DOENCA_TRA',
'EVOLUCAO',
'DT_OBITO',
'DT_ENCERRA',
'DT_DIGITA']
@luabida it not worth investing too much into this issue beside the type casting and metadata, Because datasus will migrate all of SINAN to a new platform in the near future.
@fccoelho it is not possible to change the method that extracts the data from SINAN DBCs, it is used to other pysus data as well: https://github.com/AlertaDengue/PySUS/blob/master/pysus/online_data/__init__.py#L108. because of that, I'm rewriting some of the methods so it could be done with no break in the code for other data in Pysus
https://github.com/thegraphnetwork/epigraphhub_py/actions/runs/4341044635 depends on PySUS PR merge & release
:tada: This PR is included in version 2.0.4 :tada:
The release is available on:
2.0.4
Your semantic-release bot :package::rocket:
follows https://github.com/thegraphnetwork/EpiGraphHub/pull/170
depends on https://github.com/AlertaDengue/PySUS/pull/117