philipsoutham / py-mysql2pgsql

Tool for migrating/converting from mysql to postgresql.
http://packages.python.org/py-mysql2pgsql/
MIT License
455 stars 169 forks source link

tinyints being converted #26

Open tampinc opened 11 years ago

tampinc commented 11 years ago

Hi, I was attempting to migrate a Moodle database from MySQL to Postgres. (Moodle is a php framework for managing and delivering e-learning content). I created the Postgres database using the Moodle installation procedure and tried to transfer data only, ignoring DDL, from one to the other. The Moodle versions were exactly the same between servers. Here's the issue ... The transfer failed because tinyint column values were converted from 1 and 0 values into 't' and 'f' characters in the py-mysql2pgsql output file (representing true and false maybe?). Postgres COPY command refused to import the 't' and 'f' characters into numeric fields. The fix was relatively easy, querying the MySQL data dictionary for 'tinyint' columns like so ... mysql> use information_schema mysql> tee alter_holchem_copy.sql mysql> select CONCAT('ALTER TABLE ',TABLE_NAME, ' CHANGE ', COLUMN_NAME, ' ', COLUMN_NAME, ' INTEGER;') from columns where table_schema = 'holchem_copy' and data_type = 'tinyint'; mysql> exit;

I then formatted the output with sed -i 's/..\(.*\)|/\1/' alter_holchem_copy.sql

Running this script converted all the MySQL tinyint's into int's (larger therefore no data loss) and subsequently py-mysql2pgsql copied the values between databases perfectly.

I wondered if the 1/0 -> t/f conversion for tinyint columns was by design or is there a parameter I can use to control this behaviour? I did view the MySQL data in three different clients (Navicat, phpMyAdmin, and the CLI mysql tool) to make sure the values were stored as 1/0 in the MySQL database, not as t/f.

Centos 6 py-mysql2pgsql version 0.1.5 MySQL 5.5.30 Postgres 8.4.13

Cheers, Chris

philipsoutham commented 11 years ago

It's a presumption by design. From https://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html

BOOL https://dev.mysql.com/doc/refman/5.0/en/integer-types.html, BOOLEAN

https://dev.mysql.com/doc/refman/5.0/en/integer-types.htmlThese types are synonyms for TINYINT(1)https://dev.mysql.com/doc/refman/5.0/en/integer-types.html. A value of zero is considered false. Nonzero values are considered true:

On Tue, Feb 5, 2013 at 9:47 AM, tampinc notifications@github.com wrote:

Hi, I was attempting to migrate a Moodle database from MySQL to Postgres on another server. I created the Postgres database using the Moodle installation procedure and tried to import data only, ignoring DDL, from one server to the other. The Moodle versions were exactly the same between servers. Here's the issue ... The import failed because tinyint column values were converted from 1 and 0 values into 't' and 'f' characters in the py-mysql2pgsql output file (representing true and false maybe?). The fix was relatively easy, querying the MySQL data dictionary for 'tinyint' columns like so ... mysql> tee alter_holchem_copy.sql mysql> select CONCAT('ALTER TABLE ',TABLE_NAME, ' CHANGE ', COLUMN_NAME, ' ', COLUMN_NAME, ' INTEGER;') from columns where table_schema = 'holchem_copy' and data_type = 'tinyint'; mysql> exit;

I then formatted the output with sed -i 's/..(.*)|/\1/' alter_holchem_copy.sql

Running this script converted all the MySQL tinyint's into int's (larger therefore no data loss) and subsequently py-mysql2pgsql copied the values between databases perfectly.

I wondered if the 1/0 -> t/f conversion for tinyint columns was by design or is there a parameter I can use to control this behaviour?

(py-mysql2pgsql version 0.1.5)

Cheers, Chris

— Reply to this email directly or view it on GitHubhttps://github.com/philipsoutham/py-mysql2pgsql/issues/26.

tampinc commented 11 years ago

Having just read issue #24 this looks like exactly the same issue. The columns I am having trouble with are tinyint(1) types. mysql2pgsql/lib/mysql_reader.py:95 certainly looks like the culprit as you mentioned in feedback to issue #24.

My personal preference would be for the MySQL tinyint's to be converted into PG smallint or int2 types and bit(1) fields should convert to boolean. However the MySQL BOOLEAN type is a synonym for tinyint(1) and bit(1) types, so the code can easily be argued to be correct as it stands. However I believe an "unsigned tinyint" should become a smallint as the "unsigned tinyint" is not the synonym for boolean in MySQL. If I specified a column as BOOLEAN in MySQL it would be stored as a tinyiny(1). However if I specified a column as tinyint(1) to store values between -128 to 127 my intent is indistinguishable from having specified a BOOLEAN in MySQL's data dictionary. Its a grey area due to MySQL not having a proper boolean type.

A parameter file option to switch conversion for tinyints between smallint or boolean would be the best way forward IMHO. This could be implemented first at a database level, then extended to table or individual column level if required. Using a similar syntax to including/excluding tables e.g.

convert tinyints to smallints, otherwise they become boolean

tinyint-to-smallint:

if a tablename is specified apply the conversion just to columns in that table, not globally

- tablename1

- tablename2

only apply the conversion to specified columns

- tablename3,columnname1

- tablename3,columnname2

- tablename4,columnname5

Cheers, Chris

kworr commented 11 years ago

This could be fixed by adding an additional tasting of tinyint(1), bit(1) etc fields with:

select max(field), min(field) from table;

If max is greater then 1 or min is lesser then 0 we can safely assume this is not a bool type column. However this would come at a price of one more issued query.

tampinc commented 11 years ago

@kworr That query in worst cases could trigger a full table scan on very large tables, best avoided if you don't want performance to drop through the floor.

Changing the column data type from tinyint(1) to tinyint(3) in the MySQL database would not require extra storage space and might get around this issue, I know it worked when I changed the columns to INTEGER but that was on a small database with only one table > 100,000 rows and plenty of free disk space to handle any data file growth if it happened. I'm not au-fait with the actual format of MySQL datafiles and the integer alignment used for their SQL-92 data type extensions, so converting a TINYINT to a SMALLINT or INTEGER might not take up more actual disk space, it depends on how the MySQL developers implemented them.

Easiest job for the developers of py-mysql2pgsql would be to update the documentation to highlight the issue and offer the "ALTER TABLE tinyint(1) -> tinyint(3)" workaround if anybody needs it.

The parameter based approach I suggested could work also. but that requires code changes and it's always better to avoid altering the code base if a valid workaround exists!

Cheers, Chris

tampinc commented 11 years ago

I just did a test. TINYINT(3) is converted to postgres smallint with no data formatting. Therefore its a valid workaround. It can be applied selectively using ALTER TABLE commands in MySQL. My first post gives a way of extracting all the tinyint's from a database and generating the DDL to alter them all globally before running py-mysql2pgsql.

Additional documentation is probably the best way to go. Here is an updated way of generating the required MySQL DDL from the command line.

At the command prompt mysql -u yourMySQLusername -p -s -N -e "select CONCAT('ALTER TABLE ',TABLE_NAME, ' CHANGE ', COLUMN_NAME, ' ', COLUMN_NAME, ' TINYINT(3);') from columns where table_schema ='your_database_name' and column_type like 'tinyint(1)%';" information_schema > alter_tinyints.sql You can then edit the alter_tinyints.sql file in your favourite editor if you want to remove any columns you want left alone. To perform the alterations mysql -u yourMySQLusername -p your_database_name < alter_tinyints.sql

Remember to change your_database_name and yourMySQLusername in both statements.

Cheers, Chris