Jira Link: DB-13535
On MySQL 5.5, the following observations came up when trying to migrate data from a Latin-1 charset-based schema:
Approach 1: Converted their flat file from Latin1 to UTF8 iconv -f latin1 -t utf-8 gm_yuga_tst_190722_enc.csv > gm_yuga_tst_190722_enc_utf8.csv though it converted however Voyager breaks with this error ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021) and we manually corrected and able to load but there are data loss during that load we could see close to 100K records was the difference.
Approach 2: Loaded their source file in MySQL with Latin1 CharacterSet table and ran Voyager Export and it took almost 6-8hours to export the data in Flat File. while importing again through YBDB we got following error for default date values "0000-00-00 00:00:00" as MySQL 5.5 support default date with this value where PG/YBDB won't support this. I manually updated these values with diff. date value e.g. 1901-01-01 00:00:00 and tried again but it doesn't work with Latin1 --> UTF8 conversion and thrown this error. ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021)
Approach 3: I manually updated the characterset and collation in MySQL and exported the data into 10 Flat Files (10M rows per file) with UTF8 format and applied all the default date values, integer values, JSON values then it went smooth through Voyager and now we got 119M records in single table.
We should explore the possibilities of supporting different character sets/collations through voyager, if possible
Jira Link: DB-13535 On MySQL 5.5, the following observations came up when trying to migrate data from a Latin-1 charset-based schema:
Approach 1: Converted their flat file from Latin1 to UTF8 iconv -f latin1 -t utf-8 gm_yuga_tst_190722_enc.csv > gm_yuga_tst_190722_enc_utf8.csv though it converted however Voyager breaks with this error ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021) and we manually corrected and able to load but there are data loss during that load we could see close to 100K records was the difference.
Approach 2: Loaded their source file in MySQL with Latin1 CharacterSet table and ran Voyager Export and it took almost 6-8hours to export the data in Flat File. while importing again through YBDB we got following error for default date values "0000-00-00 00:00:00" as MySQL 5.5 support default date with this value where PG/YBDB won't support this. I manually updated these values with diff. date value e.g. 1901-01-01 00:00:00 and tried again but it doesn't work with Latin1 --> UTF8 conversion and thrown this error. ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021)
Approach 3: I manually updated the characterset and collation in MySQL and exported the data into 10 Flat Files (10M rows per file) with UTF8 format and applied all the default date values, integer values, JSON values then it went smooth through Voyager and now we got 119M records in single table.
We should explore the possibilities of supporting different character sets/collations through voyager, if possible