123 [dataset] brazilian sus data

esloch commented 2 years ago

Resolve #123.

Requirements and features:

1 - Insert the datasus datasets with the PySUS library into the epigraphhub database. -- Credentials to connect the database. -- Schema and table name to write the data. -- Packages used in the module: pandas, pyarrow.parquet, pysus, psycopg2, sqlalchemy 2 - Module name sinan_fetch.py 3 - The code consists of 2 functions:

a) parquet_to_df(): download and import data from parquet to dataframe.
b) save_to_pgsql(): insert data into the database. 4 - It needs tests to validate the functionalities. 5 - This scope is part of the tool's initial documentation.

It remains to determine:

How will the script run?
How will the arguments be sent to the PySUS download function inside the module? obs: the get_available_years() function returns the datasets for each year available for download;

eduardocorrearaujo commented 2 years ago

@fccoelho, I was discussing with @esloch, the better way to store the Brazilian data in epigraphhub, and we would like to have your opinion on it.

Since the SUS provides a specific dataset for each disease and respective year, we were thinking about keeping the same structure in epigraphhub. Because this data normally has a lot of error values and we can have inconsistency between the columns of datasets from different years, it would be difficult to append the datasets in just one.

My idea was to put the datasets organized in this way and, for each disease, create a consolidated table with just the columns more relevant (making a preprocess on it) to use in our analyses and dashboards.

What are your thoughts about it?

fccoelho commented 2 years ago

Yes we should keep each disease on a separate table. The table should reflect the original structure with some preprocessing that is already implemented on PySUS.

Em qui., 22 de set. de 2022 14:09, Eduardo Correa Araujo < @.***> escreveu:

@fccoelho https://github.com/fccoelho, I was discussing with @esloch https://github.com/esloch, the better way to store the Brazilian data in epigraphhub, and we would like to have your opinion on it.

Since the SUS provides a specific dataset for each disease and respective year, we were thinking about keeping the same structure in epigraphhub. Because this data normally has a lot of error values and we can have inconsistency between the columns of datasets from different years, it would be difficult to append the datasets in just one.

My idea was to put the datasets organized in this way and, for each disease, create a consolidated table with just the columns more relevant (making a preprocess on it) to use in our analyses and dashboards.

What are your thoughts about it?

— Reply to this email directly, view it on GitHub https://github.com/thegraphnetwork/EpiGraphHub/pull/134#issuecomment-1255313114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABCGWYZAPVLINJ3K3RDAZ3V7SHFHANCNFSM6AAAAAAQNVPUPM . You are receiving this because you were mentioned.Message ID: @.***>

esloch commented 2 years ago

Done!

thegraphnetwork / EpiGraphHub

123 [dataset] brazilian sus data #134

Requirements and features:

It remains to determine: