Closed gregsifr closed 6 years ago
@gregsifr Thanks for the report. I don't directly a reason in pandas why this got so much slower (there were no major changes in the sql code at least).
But, to check if pandas is to blame, could you run the above but with the exact same versions of sqlalchemy and psycopg2 (assuming you are using postgres, otherwise the driver for your database) ?
What might also help is to post a profile result of both (if you are using ipython/jupyter, you can do that with the %prun
magic)
@jorisvandenbossche After making sure that both environments had sqlalchemy: 1.1.9
and psycopg2: 2.7.3 (dt dec pq3 ext lo64)
installed I repeated the test. The pandas 0.19.2
environment completed the task in 39s which is about the same as before, whilst pandas 0.20.3
took 54s.
I've included the results of %prun
for you below:
@gregsifr can you profile 0.20.3 as well, and compare the outputs to see where the slowdown is?
@TomAugspurger Updated my previous post to include both.
@gregsifr thanks for the updates. So it seems the slowdown is already a bit less after making sure the same versions are installed?
That said, when looking at the profiles, I see that for 0.19.2 the psycopg2's executemany
is taking 40.5 s out of 41.9 s total. While for 0.20.3, executemany
is taking 51.9 s out of 54.5 s total.
So it seems that the biggest part of the slowdown is still due to psycopg2? (the code inside pandas increased a little bit, but is still rather negligible compared to the psycopg2 time)
Thank you for doing that. I will raise a ticket in the psycopg2
library for further investigation.
Judging from this conversation, it appears that we're in the clear for this problem. Closing for the time being, but can reopen if it turns out that pandas
is to blame.
Code Sample, a copy-pastable example if possible
Problem description
Since upgrading to 0.20+ I am finding that
to_sql
takes twice as long. I used two environments to test the code on the same machine, below are the results:pandas 0.20.3
pandas 0.19.2
Please note that each test was performed on an empty table.
Output of
pd.show_versions()