perseas / Pyrseas

Provides utilities for Postgres database schema versioning.
https://perseas.github.io/
BSD 3-Clause "New" or "Revised" License
395 stars 67 forks source link

Faster deepcopy #237

Open martinuncountable opened 2 years ago

martinuncountable commented 2 years ago

Profiling my setup I noticed I was spending a lot of time in this deepcopy call.

Looking at this: https://stackoverflow.com/questions/24756712/deepcopy-is-extremely-slow#answer-29385667 Using pickle to copy objects is simple approach to getting a faster copy of objects.

I tested this and it gave a 50% faster time when dumping out one table. From 2.6-2.8s to 1.1-1.2s.

jmafc commented 2 years ago

@martinuncountable I'm hesitant to merging this. For one thing, the SO issue appears to indicate the pickle is worse than deepcopy in Python 2, and although that is supposed to be "on the way out", the last issue reported to Pyrseas was from someone using 2.7. Second, IIRC the deepcopy was added first to fix issue #182, then @feikesteenbergen discovered that we were spending too much time deepcopying columns, when it wasn't really necessary, so I merged PR #187. Since table columns are likely the most frequently occurring object in a typical database, my gut feel is that the other deepcopy's are not that critical. Since I've been away from Pyrseas, and not coding anything in Python for a while, I would like a second opinion, perhaps from @feikesteenbergen or @dvarrazzo?

dvarrazzo commented 2 years ago

If you could do with a shallow copy, that would be better, yes. Using pickle as a deepcopy optimisation is quite a wart to see so it's great if its reason to do so could disappear :smile:

Python 2.7 is not on its way out: it is totally a thing of the past and should be deeply buried, so it should be of no concern whatsoever today (a.k.a., if you maintain a Free Software project and someone asks you to support Python 2.7, it is your chance to ask for money to do so, because you don't owe it to anyone).

jmafc commented 2 years ago

@dvarrazzo Thanks for the comments, Daniele. With all due respect, I'm on Debian 10 and the default Python is 2.7, so it's not entirely a ghost of Xmas past , yet ... And if you look at #226, you'll see a stack trace from 2.7 from 2 Nov 2021. To make deepcopy disappear, would entail reinvestigating #182 to determine where exactly a deep copy is absolutely needed, and possibly implementing a specialized copy (the SO issue above has an example) to deal with those cases. FWIW, I believe the main problem in #182 was with foreign keys being visited before the referred table or vice versa.

dvarrazzo commented 2 years ago

No problem for me, @jmafc :) it goes without saying that you are free to choose whatever deprecation policy you prefer for the projects you maintain.

jmafc commented 2 years ago

@martinuncountable I may merge this after testing against something more representative and substantial (probably the pagila database under tests/functional) to see if the overall runtime is significantly affected or not.

martinuncountable commented 2 years ago

@jmafc sounds good! Thanks for responding so quickly.

FYI, for dumping my full schema it dropped the time from ~2-3 mins down to 1 min.

jmafc commented 2 years ago

@martinuncountable I'm curious now: how many tables and other objects in your full schema? In my experience, dbtoyaml never runs beyond one minute.

martinuncountable commented 2 years ago

@jmafc There are ~300 tables, ~100 types and ~10 functions.