pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.63k stars 1.63k forks source link

write_database() - Insert many rows with sql server using fast_executemany #16053

Closed gregory-kral-shell closed 1 week ago

gregory-kral-shell commented 1 week ago

Description

I would like to be able to insert many rows quickly in an sql server database table with df.write_database()

I can currently achieve this with the fast_executemany option in pandas but I can’t find a way to do this with polars.write_database() which only accepts a connection string and not an sql alchemy engine. The pandas code is much faster than the polars equivalent without fast_executemany.

Example code with pandas: engine = create_engine( "mssql+pyodbc://{user}:{password}@{server}/{db_name}?driver=ODBC+Driver+17+for+SQL+Server", fast_executemany=True, ) ret = curve.to_pandas().to_sql( name="curve", con=engine, if_exists="append", index=False, )

Example code with polars: curve.write_database( table_name="curve", connection="mssql+pyodbc://{user}:{password}@{server}/{db_name}?driver=ODBC+Driver+17+for+SQL+Server", if_table_exists="append", engine="sqlalchemy", )

alexander-beedie commented 1 week ago

Yup; allowing write_database to take a connection object is on my near-term "todo" list... I'll try to get it done this week. (Longer-term I have a total rewrite of this entire set of functionality in mind so we can drop the internal adbc/pandas redirects completely 😅)

alexander-beedie commented 1 week ago

Done; in the upcoming 0.20.26 you'll be able to pass instantiated connections to write_database, so you can use the same approach (create one with fast_executemany) with Polars too 👌

gregory-kral-shell commented 1 week ago

Great thanks for that!!!