seanharr11 / etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.
MIT License
556 stars 83 forks source link

UnicodeDecodeError when converting RowProxy to list (raw_rows) #4

Closed seanharr11 closed 8 years ago

seanharr11 commented 8 years ago

Originally reported by @EthanBannister, this occurs when migrating from a SQL Server database, onto MySQL, and occurs in python2.7 when converting the RowProxy generator into a list.

Dropping database 'Temp_For_ETL' Creating database 'Temp_For_ETL' Sending source '<etlalchemy.ETLAlchemySource.ETLAlchemySource instance at 0x7fb70770f9e0>' to destination 'mysql://user:password@localhost/Temp_For_ETL' /usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mssql/base.py:1643: SAWarning: Unrecognized server version info '95.12.255'. Version specific behaviors may not function properly. If using ODBC with FreeTDS, ensure TDS_VERSION 7.0 through 7.3, not 4.2, is configured in the FreeTDS configuration. ".".join(str(x) for x in self.server_version_info)) ETLAlchemySource (INFO) -


    **\* Total Tables: 1 ***
    *************************

ETLAlchemySource (INFO) - Reading Table Schema 'pull_table'... ETLAlchemySource (INFO) - Loaded indexes and FKs for table 'pull_table' ETLAlchemySource (WARNING) - Table 'pull_table' does not exist in the dst database (we will create this later...) ETLAlchemySource (INFO) - Building query to fetch all rows from pull_table ETLAlchemySource (INFO) - Done. (164558 total rows) ETLAlchemySource (INFO) - Loading all rows into memory... ETLAlchemySource (INFO) - Fetched 10000 rows ETLAlchemySource (INFO) - Fetched 20000 rows ETLAlchemySource (INFO) - Fetched 30000 rows ETLAlchemySource (INFO) - Fetched 40000 rows ETLAlchemySource (INFO) - Fetched 50000 rows ETLAlchemySource (INFO) - Fetched 60000 rows ETLAlchemySource (INFO) - Fetched 70000 rows ETLAlchemySource (INFO) - Fetched 80000 rows ETLAlchemySource (INFO) - Fetched 90000 rows ETLAlchemySource (INFO) - Fetched 100000 rows ETLAlchemySource (INFO) - Fetched 110000 rows ETLAlchemySource (INFO) - Fetched 120000 rows ETLAlchemySource (INFO) - Fetched 130000 rows ETLAlchemySource (INFO) - Fetched 140000 rows ETLAlchemySource (INFO) - Fetched 150000 rows ETLAlchemySource (INFO) - Fetched 160000 rows Traceback (most recent call last): File "etl_alm_asset.py", line 9, in mysql_db_target.migrate(migrate_fks=False, migrate_indexes=False, migrate_data=True, migrate_schema=True) File "/usr/local/lib/python2.7/dist-packages/etlalchemy/ETLAlchemyTarget.py", line 86, in migrate migrate_data=migrate_data) File "/usr/local/lib/python2.7/dist-packages/etlalchemy/ETLAlchemySource.py", line 995, in migrate raw_rows = [list(row) for row in rows] UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 10: invalid start byte

EthanBannister commented 8 years ago

I was able to figure it out. Everything is right with the world again!

I just had to make some adjustments to the freetds.conf, odbc.ini, and odbcinst.ini files. Here are the entry examples of those files:

freetds.conf:

[DSN] host = 192.168.2.103 port = 1433 tds version = 8.0 odbc.ini:

[DSN] Driver = FreeTDS Port = 1433 Database = MyDatabase TDS_Version = 8.0 ServerName = DSN odbcinst.ini:

[FreeTDS] Description = FreeTDS Driver v0.91 Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so It also did not provide me with the "SAWarning: Unrecognized server version info '95.12.255'" message during the output either - which was nice to also see.

Also, I changed a few things for my own purposes, and would like to share them with you in hopes that you may be able to implement it in future releases.

The first thing that I changed was within ETLAlchemySource File at line 650 which determines what each field is enclosed by. I changed this from a single quote to a double since the data that was being loaded appeared to include double quotes that were included within the CSV that is generated.

From: "--fields-enclosed-by=\"'\" "

To: "--fields-optionally-enclosed-by=\'\"\' "

I also commented out line 736 which removes the csv file that is generated after all is said and done. This is to allow me to have something that I can use to import, just in case something happens again where I can not longer use the script.

os.remove(data_file_path)

Finally, where can I donate for this project? This script was a help for me and I wouldn't mind throwing a little cash your way for your efforts.

Thanks again for the help!

seanharr11 commented 8 years ago

Great to hear Ethan! I've added your suggestions to the TODO.md file so we don't lose sight of them. (I actually had originally developed support for 'fields-enclosed-by', but as I added support for additional RDBMS's it grew difficult to maintain).

As to donations, I've been waiting to get the project approved by Gratipay, but for now, there is a Paypal Donations button at the top of the project home page that has been added in the most recent commit (this morning)!

Thanks for helping support the project!

Closing this issue as it appears to be related to FreeTDS configuration and compatibility.