Open JordanPavlic opened 5 years ago
Have you tried if_exists
?
If_exists only determines the behavior of what to_sql() will do if a table exists. It does not however allow a user to set the behavior when no table exists i.e. if it should create a table or simply fail by raising an exception.
This could be controlled with a boolean that allows a user to specify whether pandas should create a table or not when a table doesn't exist in the database.
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’ How to behave if the table already exists.
fail: Raise a ValueError. replace: Drop the table before inserting new values. append: Insert new values to the existing table.
I think that rather than having options for behaviors when a table doesn't exist, maybe a table just shouldn't be created by default when it doesn't exist and a user tries to insert records. I'd be happy to work on finding and taking out/changing the piece of code that's creating a table when it doesn't exist for inserts.
Possible solution:
# assuming sqlalchemy `Engine` as engine
if engine.has_table('my_table'):
df.to_sql('my_table', engine, if_exists='append')
I disagree with changing the default behavior of creating a table since calling to_sql
is (most?) commonly called with the intent of creating a table.
The current logic handles the various options for if_exists
or creates the table if none exists, then inserts data. Currently, the only option that doesn't (attempt to) insert data is fail
which raises a ValueError
. I interpret this as defining the scope of the to_sql
method as insert data into the database or else raise an error.
There is another related issue that I faced recently, the new table schema is a modification of the old one, for example new columns are introduced. In this case, would it be possible to make the 'replace' implementation use ALTER commands instead of DROP? Or provide such option for the parameter 'if_exists". This avoids cascading delete of dependent views.
TRUNCATE will fail if the table schema is changed.
Problem description
Pandas.DataFrame.to_sql() assumes that if no table exists it should create one. In the specific case where a user wants to simply insert records, but not have to_sql() generate a table if it doesn't exist this causes problems. There is also no way to currently override the default behavior of creating a table according to the documentation.
I'm running into this problem because I have an application that has to dynamically generate database tables for storage as part of an ETL pipeline where I'm using to_sql() to insert records into the tables. If for any number of reasons table creation fails during this process and no table is created, then when to_sql() is called the default behavior of Pandas.DataFrame.to_sql() is to create a new table with whatever types the fields of the dataframe are currently in. This table created by Pandas is missing all the indexes, primary/foreign keys, etc that the dynamic table generation process creates.
A fix might be to add a new parameter, if_not_exists, that allows a user to set the behavior of to_sql() if a table is not detected.