philipsoutham / py-mysql2pgsql

Tool for migrating/converting from mysql to postgresql.
http://packages.python.org/py-mysql2pgsql/
MIT License
455 stars 169 forks source link

Error when writing data to postgres #37

Closed sslivitzky closed 11 years ago

sslivitzky commented 11 years ago

Hi,

I get this error when migrating a mysql db:

Traceback (most recent call last): File "/usr/local/bin/py-mysql2pgsql", line 5, in pkg_resources.run_script('py-mysql2pgsql==0.1.6', 'py-mysql2pgsql') File "/usr/local/lib/python2.7/dist-packages/setuptools-1.1-py2.7.egg/pkg_resources.py", line 492, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/dist-packages/setuptools-1.1-py2.7.egg/pkg_resources.py", line 1350, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/EGG-INFO/scripts/py-mysql2pgsql", line 38, in mysql2pgsql.Mysql2Pgsql(options).convert() File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/mysql2pgsql.py", line 31, in convert Converter(reader, writer, self.file_options, self.run_options.verbose).convert() File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/converter.py", line 51, in convert self.writer.write_contents(table, self.reader) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/init.py", line 86, in decorated_function ret = f(_args, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/postgres_db_writer.py", line 193, in write_contents self.copy_from(f, '"%s"' % table.name, ['"%s"' % c['name'] for c in table.columns]) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/postgres_db_writer.py", line 116, in copy_from columns=columns psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error in .read() call CONTEXT: COPY tupdate, line 1

I'm running mysql 5.5, postgresql 9.2 and python 2.7.3.

I've read other posts on this issue and I've tried py-mysql2pgsql 0.1.5 with the same result.

Any help would be greatly appreciated.

Thanks,

Serge

sslivitzky commented 11 years ago

I dig more and found out that the table giving the error has a column defined as longtext containing rows over 6000 characters maybe this is the problem. As a test I deleted all rows from the problematic table and the migration worked fine so I just need to fix this.

kworr commented 11 years ago

Probably I'm little late but this errors can be a result of malformed MySQL query too. You had to check both postgresql and mysql logs.

Anyway are we closing this one?

sslivitzky commented 11 years ago

There's no error in the mysql logs but I get those one in postgres:

... 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 413 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 205 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 154 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 174 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 165 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 139 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:26 EDT WARNING: nonstandard use of \ in a string literal at character 107 2013-09-04 08:48:26 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT WARNING: nonstandard use of \ in a string literal at character 103 2013-09-04 08:48:28 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT WARNING: nonstandard use of \ in a string literal at character 115 2013-09-04 08:48:28 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT WARNING: nonstandard use of \ in a string literal at character 225 2013-09-04 08:48:28 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT WARNING: nonstandard use of \ in a string literal at character 126 2013-09-04 08:48:28 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT WARNING: nonstandard use of \ in a string literal at character 232 2013-09-04 08:48:28 EDT HINT: Use the escape string syntax for backslashes, e.g., E'\'. 2013-09-04 08:48:28 EDT ERROR: COPY from stdin failed: error in .read() call 2013-09-04 08:48:28 EDT CONTEXT: COPY tupdate, line 1 2013-09-04 08:48:28 EDT STATEMENT: COPY "tupdate"("id","type","id_update_package","filename","checksum","previous_checksum","svn_version","data","data_rollback","description","db_table_name","db_field_name","db_field_value") FROM stdin WITH DELIMITER AS ' ' NULL AS '\N'

sslivitzky commented 11 years ago

I tried to output to a file and I got this error:

Traceback (most recent call last): File "/usr/local/bin/py-mysql2pgsql", line 5, in pkg_resources.run_script('py-mysql2pgsql==0.1.6', 'py-mysql2pgsql') File "/usr/local/lib/python2.7/dist-packages/setuptools-1.1-py2.7.egg/pkg_resources.py", line 492, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/dist-packages/setuptools-1.1-py2.7.egg/pkg_resources.py", line 1350, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/EGG-INFO/scripts/py-mysql2pgsql", line 38, in mysql2pgsql.Mysql2Pgsql(options).convert() File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/mysql2pgsql.py", line 31, in convert Converter(reader, writer, self.file_options, self.run_options.verbose).convert() File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/converter.py", line 51, in convert self.writer.write_contents(table, self.reader) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/init.py", line 86, in decorated_function ret = f(_args, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/postgres_file_writer.py", line 135, in write_contents pr(table, row) File "/usr/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.6-py2.7.egg/mysql2pgsql/lib/postgres_writer.py", line 153, in process_row row[index] = Binary(row[index]).getquoted()[1:-8] if row[index] else row[index] TypeError: can't escape unicode to binary

kworr commented 11 years ago

04.09.2013 18:48, Serge wrote:

I tried to output to a file and I got this error:

Can you try something like this:

Index: mysql2pgsql/lib/postgres_writer.py

--- mysql2pgsql/lib/postgres_writer.py (revision 63) +++ mysql2pgsql/lib/postgres_writer.py (working copy) @@ -150,7 +150,7 @@ row[index] = bin(ord(row[index]))[2:] elif isinstance(row[index], (str, unicode, basestring)): if column_type == 'bytea':

The better way would be some sample table with data that yields this error.

Sphinx of black quartz, judge my vow.

sslivitzky commented 11 years ago

I can try to work with what you gave me but I also can send you a mysqldump of the problematic table.

sslivitzky commented 11 years ago

The zip dump file is 3.3 MB.

sslivitzky commented 11 years ago

I did the modification and I don't have an error any more, the migration complete successfully. Do you still want some sample data?

kworr commented 11 years ago

Well... I'd prefer to thoroughly test such changes and don't commit blind patches to the code. And I'm not feeling like I did everything right.

I think it's time to hand-craft sample mysql database creation script to be included for testing purposes...

sslivitzky commented 11 years ago

Let me know the best way to send you the dump file, I didn't see any upload button.

kworr commented 11 years ago

Let's start from the simple things. Can you provide table schema for this table without any data so I can try to guess what's going on? On MySQL you can get table schema with "SHOW CREATE TABLE ;".

sslivitzky commented 11 years ago

CREATE TABLE tupdate ( id int(11) unsigned NOT NULL, type enum('code','db_data','db_schema','binary') DEFAULT NULL, id_update_package int(11) unsigned NOT NULL DEFAULT '0', filename varchar(250) DEFAULT '', checksum varchar(250) DEFAULT '', previous_checksum varchar(250) DEFAULT '', svn_version int(4) unsigned NOT NULL DEFAULT '0', data longtext, data_rollback longtext, description text, db_table_name varchar(140) DEFAULT '', db_field_name varchar(140) DEFAULT '', db_field_value varchar(1024) DEFAULT '', PRIMARY KEY (id), KEY id_update_package (id_update_package), CONSTRAINT tupdate_ibfk_1 FOREIGN KEY (id_update_package) REFERENCES tupdate_package (id) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=latin1

kworr commented 11 years ago

Yes, I was totally wrong. Second column is misdetected as bytea because it's definition contains predefined word 'binary'. You can confirm this by looking at the contents of the second field (which should contain garbage after converting).

Better fix would be putting enum higher so that it would grab enum with any content that further types can claim:

Index: postgres_writer.py
===================================================================
--- postgres_writer.py  (revision 63)
+++ postgres_writer.py  (working copy)
@@ -104,10 +104,6 @@
                     return default, 'time with time zone'
                 else:
                     return default, 'time without time zone'
-            elif 'blob' in column['type'] or 'binary' in column['type']:
-                return default, 'bytea'
-            elif column['type'] in ('tinytext', 'mediumtext', 'longtext', 'text'):
-                return default, 'text'
             elif re.search(r'^enum', column['type']):
                 default = (' %s::character varying' % default) if t(default) else None
                 enum = re.sub(r'^enum\(|\)$', '', column['type'])
@@ -114,6 +110,10 @@
                 # TODO: will work for "'.',',',''''" but will fail for "'.'',','.'"
                 max_enum_size = max([len(e.replace("''", "'")) for e in enum.split("','")])
                 return default, ' character varying(%s) check(%s in (%s))' % (max_enum_size,     column['name'], enum)
+            elif 'blob' in column['type'] or 'binary' in column['type']:
+                return default, 'bytea'
+            elif column['type'] in ('tinytext', 'mediumtext', 'longtext', 'text'):
+                return default, 'text'
             elif 'bit(' in column['type']:
                 return ' DEFAULT %s' % column['default'].upper() if column['default'] else column['default'], 'varbit(%s)' % re.search(r'\((\d+)\)', column['type']).group(1)
             elif 'set(' in column['type']:

Please test whether second field in your table would turn into the string.

sslivitzky commented 11 years ago

This is strange. After setting the bytea_output to escape, the value for the second column are shown correctly in psql.

sslivitzky commented 11 years ago

I removed the previous mod we did and applied this one.

The migration worked without any error and the column value was ok.

pandora=> \d tupdate Table "public.tupdate" Column | Type | Modifiers
-------------------+-------------------------+------------------------------- id | bigint | not null type | character varying(9) | id_update_package | bigint | not null default 0 filename | character varying(250) | default ''::character varying checksum | character varying(250) | default ''::character varying previous_checksum | character varying(250) | default ''::character varying svn_version | bigint | not null default 0 data | text | data_rollback | text | description | text | db_table_name | character varying(140) | default ''::character varying db_field_name | character varying(140) | default ''::character varying db_field_value | character varying(1024) | default ''::character varying Indexes: "tupdate_id_pkey" PRIMARY KEY, btree (id) "tupdate_id_update_package" btree (id_update_package) Check constraints: "tupdate_type_check" CHECK (type::text = ANY (ARRAY['code'::character varying, 'db_data'::character varying, 'db_schema'::character varying, 'binary'::character varying]::text[])) Foreign-key constraints: "tupdate_id_update_package_fkey" FOREIGN KEY (id_update_package) REFERENCES tupdate_package(id) Referenced by: TABLE "tupdate_journal" CONSTRAINT "tupdate_journal_id_update_fkey" FOREIGN KEY (id_update) REFERENCES tupdate(id)

Thanks.

kworr commented 11 years ago

I've pushed a final fix. Closing.