thegraphnetwork / epigraphhub_py

Epigraphhub Python package
GNU General Public License v3.0
2 stars 9 forks source link

Optimize upsert call #191

Closed esloch closed 1 year ago

esloch commented 2 years ago

Describe the bug

@eduardocorrearaujo and @fccoelho, do we need to use the parameters create_table and add_new_columns with value True here?

      upsert(con=conn, df=new_df, table_name=f'foph_{table.lower()}_d', schema='switzerland', if_row_exists='update',
          chunksize=1000, add_new_columns=True, create_table=True) 

Notes It is recommanded to use this function with big batches of data as there is quite the overhead. Setting the arguments create_schema, add_new_columns and adapt_dtype_of_empty_db_columns to False should drastically reduce the overhead if you do not need such features.

To Reproduce

Calling the upsert function.

Expected behavior

The function call should be faster.

Screenshots

No response

Desktop

Smartphone

Additional context

No response

fccoelho commented 2 years ago

@esloch add_columns=True is important, but maybe we can put it inside a try/except. Because it is not very common that they modify column names. So we try first with add_columns=False, and if it fails with try the current version. create_table, on the other hand, I think is not needed, because what this command does is just an insert.

github-actions[bot] commented 1 year ago

Stale issue message