seanharr11 / etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.
MIT License
555 stars 83 forks source link

Migration to MySQL can silently fail due to mysqlimport being run via os.system() #6

Open TBusen opened 8 years ago

TBusen commented 8 years ago

ETLAlchemySource (INFO) -


    *** Total Tables: 1 ***
    *************************

ETLAlchemySource (INFO) - Reading Table Schema 'xxx_stat'... ETLAlchemySource (INFO) - Loaded indexes and FKs for table 'xxx_stat' ETLAlchemySource (INFO) - Building query to fetch all rows from xxx_stat ETLAlchemySource (INFO) - Done. (4 total rows) ETLAlchemySource (INFO) - Loading all rows into memory... ETLAlchemySource (INFO) - Done ETLAlchemySource (INFO) - (xxx_stat) VARCHAR ETLAlchemySource (INFO) - Bases: ['STRING'] ETLAlchemySource (INFO) - Converting to -> VARCHAR(2 (maxsize: 1) ETLAlchemySource (INFO) - Checking column for elimination status... ETLAlchemySource (INFO) - (xxx_desc) VARCHAR ETLAlchemySource (INFO) - Bases: ['STRING'] ETLAlchemySource (INFO) - Converting to -> VARCHAR(8 (maxsize: 8) ETLAlchemySource (INFO) - Checking column for elimination status... ETLAlchemySource (WARNING) - Table '{0}' already exists - not creating table, reflecting to get new changes instead.. ETLAlchemySource (INFO) - Transforming & Dumping 4 total rows from table 'xxx_stat' into 'C:\Users\busenta\Documents/xxx_stat.sql'. ETLAlchemySource (INFO) - (xxx_stat) -- Transforming rows: 0 -> 4...(4 Total) ETLAlchemySource (INFO) - (xxx_stat) -- Dumping rows: 0 -> 4 to 'xxx_stat.sql'...(4 Total)[Table 0/1] ETLAlchemySource (INFO) - Gathering unique columns for upsert. ETLAlchemySource (INFO) - Unique columns are '[Column('xxx_stat', VARCHAR(length=2), table=, primary_key=True, nullable=False)]' ETLAlchemySource (INFO) - Creating 'upsert' statements for '4' rows, and dumping to 'xxx_stat.sql'.

TypeError Traceback (most recent call last)

in () 8 9 mysql.addSource(ora) ---> 10 mysql.migrate() C:\Users\me\AppData\Local\Continuum\Anaconda3\envs\py27\lib\site-packages\etlalchemy\ETLAlchemyTarget.pyc in migrate(self, migrate_schema, migrate_data, migrate_fks, migrate_indexes) 84 str(self.conn_string) + "'") 85 source.migrate(self.conn_string, migrate_schema=migrate_schema, ---> 86 migrate_data=migrate_data) 87 if migrate_indexes: 88 source.add_indexes(self.conn_string) C:\Users\me\AppData\Local\Continuum\Anaconda3\envs\py27\lib\site-packages\etlalchemy\ETLAlchemySource.pyc in migrate(self, destination_database_url, migrate_data, migrate_schema) 1117 self.dump_data( 1118 T_dst_exists, T, raw_rows[startRow:endRow], -> 1119 pks, Session) 1120 del raw_rows[startRow:endRow] 1121 C:\Users\me\AppData\Local\Continuum\Anaconda3\envs\py27\lib\site-packages\etlalchemy\ETLAlchemySource.pyc in dump_data(self, T_dst_exists, T, raw_rows, pks, sessionMaker) 815 row = raw_rows[r] 816 for pk in pks: --> 817 uid += str(row[self.current_ordered_table_columns[pk]]) 818 if upsertDict.get(uid): 819 with open(data_file_path, "a+") as fp: TypeError: list indices must be integers, not unicode --- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/38900896-migration-to-mysql-can-silently-fail-due-to-mysqlimport-being-run-via-os-system?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github).
TBusen commented 8 years ago

this is just a test table, moving it from mysql to oracle with same connections works fine, just can't write to mysql from oracle

seanharr11 commented 8 years ago

I have fixed this issue, however I wanted to point something out to you.

This was certainly an error, and arose from some performance updates which changed the way we store and access rows in memory, by changing from dict to list objects.

This only occurs when you want to UPSERT rows into your Target database, so it is not a MySQL vs. Oracle issue. Wanted to point this out, because this is a feature of this tool that is not very well documented (currently).

If your Target database tables already contain rows, ETLAlchemy will try to UPSERT rows: it will insert rows that DNE, and it will update rows that do exist. It uses the primary key value to check as a unique index to check for existence (or lack of) of a row, and does not support the UPSERT migration if a table does not contain Primary Keys.

Could you validate that this is working, and close this issue if this has fixed the problem?

TBusen commented 8 years ago

I will check this, but is there a way around the PK requirement? For some of our analytic Tables we don't have PK not vector duplicated data but NULL values that arise when you join multiple tables.

Thanks, Travis

Sent from my iPhone

On Nov 14, 2016, at 8:09 AM, Sean Harrington notifications@github.com wrote:

I have fixed this issue, however I wanted to point something out to you.

This was certainly an error, and arose from some performance updates which changed the way we store and access rows in memory, by changing from dict to list objects.

This only occurs when you want to UPSERT rows into your Target database, so it is not a MySQL vs. Oracle issue. Wanted to point this out, because this is a feature of this tool that is not very well documented (currently).

If your Target database tables already contain rows, ETLAlchemy will try to UPSERT rows: it will insert rows that DNE, and it will update rows that do exist. It uses the primary key value to check as a unique index to check for existence (or lack of) of a row, and does not support the UPSERT migration if a table does not contain Primary Keys.

Could you validate that this is working, and close this issue if this has fixed the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

TBusen commented 8 years ago

Well it doesn't break anymore, but it just creates an empty table in MySQL. The table has the PK, FK and indexes transferred to from Oracle to MySQL, but not data. ETLAlchemy exports all the data to CSV and it says it writes it to target DB but it's not. I get no errors.

Attached is my output

python_output.txt

seanharr11 commented 8 years ago

The 2 runs you describe (first one breaks, second one runs) are running in two different states. The first run outlined in the Issue description, is UPSERTING rows into a table that already exists on the target host. Your recent run ( that failed to migrate data ) is INSERTING rows, because the table 'xxx' does not exist on the target database.

That said, no reason comes to mind regarding why the data wouldn't be loading. I would comment-out line 736 in ETLAlchemySource.py (os.remove(data_file_path)), so you can view the contents of the file after ETLAlchemy dumps the MySQL statements into it.

Could you do that, and attach the file here?

TBusen commented 8 years ago

I commented line 736 and ran it. The csv file looks fine. I am not able to upload it though as it contains confidential information. That's why I had to modify my post ie table name 'xxx'. I am doing this for my job and the material is sensitive.

It still did not load the data.

TBusen commented 8 years ago

I think I know why. You are using mysqlimport and this is a corporate MySQL server. I do not have ssh privs on this server so I can't access the host mysql shell. I can only issue a LOAD DATA INFILE through a client.

I did notice that ETLAlchemy is issuing this optionally enclosed-by="'" , but the file has the columns enclosed by double quotes

TBusen commented 8 years ago

I broke this down to a couple of elements and narrowed it down to the local_infile option not being set to 1 when using the script. I tested with SQL Alechemy and I couldn't execute a LOAD DATA LOCAL INFILE on the CSV the ETLAlchemy downloaded. When I added ?local_infile=1 to the MySQL connection string after /db (/db?local_infile=1) I was able to execute the command and the data loaded. However, when I add it to my ETLAlchemyTarget connection string it does nothing and still can't load the file. I'm pretty confident this is the issue. Just need to figure out how to make ETLAlchemyTarget use the parameter

seanharr11 commented 8 years ago

It won't matter if you set local_infile=1 on ETLAlchemyTarget's connection string, because ETLAlchemy does not run LOAD DATA LOCAL INFILE through the SQLAlchemy engine. It runs the mysqlimport command line tool to load the data into the Target. It uses this tool for a number of reasons, one of them being the issue you laid out above with pesky connection parameters when running LOAD DATA LOCAL INFILE..

Can you try to run the mysqlimport utility to load the file ? Please let me know what errors you get when you try this (an upload of the command's output would be great). You can copy the command from ETLAlchemySource line 646.

By the way, this utility can fail silently b/c I am running it via os.system(). I need to update this to subprocess.run() so I can catch exceptions.

seanharr11 commented 7 years ago

Were you able to load the file using the mysqlimport utility? If not, can you send me the errors that you get?

seanharr11 commented 7 years ago

@TBusen you mentioned before that

I did notice that ETLAlchemy is issuing this optionally enclosed-by="'" , but the file has the columns enclosed by double quotes

This has been resolved in commit: https://github.com/seanharr11/etlalchemy/commit/746b410975cefe242357078fbe7b1ba596dc4bcf

To my previous comment, were you able to import the file via mysqlimport ?