singlestore-labs / singlestoredb-python

Python SDK for the SingleStore database and workspace management APIs
Apache License 2.0
22 stars 17 forks source link

writing dataframe records with singlestoredb is slow compared to pymysql #6

Closed morokosi closed 1 year ago

morokosi commented 2 years ago

When writing dataframe records to singlestore using singlestoredb instead of mysql+pymysql in combination with sqlalchemy + pandas and , there is a performance regression (possibly because of lack of support for executemany natively).

import pandas as pd
import numpy as np
from sqlalchemy.engine import create_engine
df = pd.DataFrame(np.random.randint(0,100,size=(1000, 4)), columns=list('ABCD'))
engine_pymysql = create_engine('mysql+pymysql://user:pass@host/db', pool_recycle=3600)
engine_singlestoredb = create_engine('singlestoredb://user:pass@host/db', pool_recycle=3600)

%%time
df.to_sql("test", engine_pymysql, if_exists="replace")
# CPU times: user 16.6 ms, sys: 2.68 ms, total: 19.2 ms
# Wall time: 988 ms

%%time
df.to_sql("test", engine_singlestoredb, if_exists="replace")
# CPU times: user 437 ms, sys: 93.7 ms, total: 530 ms
# Wall time: 21.1 s
kesmit13 commented 2 years ago

Thanks for reporting this. You are correct that there is a bottleneck in the executemany method. We're actually in the middle of refactoring the package code and fixing this is part of that (as well as more big performance improvements). We hope to have a new release out in the next few weeks.

kesmit13 commented 1 year ago

Release v0.5.0 should fix this issue. PyMySQL and SingleStoreDB executemany and DataFrame uploads should be roughly the same now. If you see anything otherwise, just re-open the issue.