philipsoutham / py-mysql2pgsql

Tool for migrating/converting from mysql to postgresql.
http://packages.python.org/py-mysql2pgsql/
MIT License
455 stars 169 forks source link

Parallelism #36

Open raananraz opened 10 years ago

raananraz commented 10 years ago

Hi, 1st, Thank you very much for an amazing tool. it worked like a charm 2nd, it would be great if the migration of the data can be set with an amount of parallelism. so i can copy X tables at once.

What I have done so far is for the big tables i used Enterprise DB migration tools but for all the DDL and small tables used the normal tools.

Thanks again!

philipsoutham commented 10 years ago

Yeah, that thought had crossed my mind, I just haven't had the time needed to make improvements like that. Maybe in Q1 of 2014. I'm open to pull requests too if you want to take this on.. :-)

Philip Southam Chief Architect / Яeverse Эngineer http://zefr.com On Aug 24, 2013 7:42 AM, "raananraz" notifications@github.com wrote:

Hi, 1st, Thank you very much for an amazing tool. it worked like a charm 2nd, it would be great if the migration of the data can be set with an amount of parallelism. so i can copy X tables at once.

What I have done so far is for the big tables i used Enterprise DB migration tools but for all the DDL and small tables used the normal tools.

Thanks again!

— Reply to this email directly or view it on GitHubhttps://github.com/philipsoutham/py-mysql2pgsql/issues/36 .

kworr commented 10 years ago

If we are talking about speed there are a lot of things to improve before bringing in threads:

  1. Writes to databases are unbuffered. This can be fixed with gevent without big code revamp.
  2. All data is written to PostgreSQL via COPY statement and that's very slow. Instead of simply sending data to PostgreSQL we actually first create a textual representation of data with formatting, bells and whistles and only then we do send it to database. Using one simple insert statement to process all data (i.e. invoking insert with placeholders and then feeding it with arrays (rows) would be much faster because this would not include sophisticated data processing in python.