Open TBusen opened 8 years ago
this is just a test table, moving it from mysql to oracle with same connections works fine, just can't write to mysql from oracle
I have fixed this issue, however I wanted to point something out to you.
This was certainly an error, and arose from some performance updates which changed the way we store and access rows in memory, by changing from dict
to list
objects.
This only occurs when you want to UPSERT rows into your Target database, so it is not a MySQL vs. Oracle issue. Wanted to point this out, because this is a feature of this tool that is not very well documented (currently).
If your Target database tables already contain rows, ETLAlchemy will try to UPSERT rows: it will insert rows that DNE, and it will update rows that do exist. It uses the primary key value to check as a unique index to check for existence (or lack of) of a row, and does not support the UPSERT migration if a table does not contain Primary Keys.
Could you validate that this is working, and close this issue if this has fixed the problem?
I will check this, but is there a way around the PK requirement? For some of our analytic Tables we don't have PK not vector duplicated data but NULL values that arise when you join multiple tables.
Thanks, Travis
Sent from my iPhone
On Nov 14, 2016, at 8:09 AM, Sean Harrington notifications@github.com wrote:
I have fixed this issue, however I wanted to point something out to you.
This was certainly an error, and arose from some performance updates which changed the way we store and access rows in memory, by changing from dict to list objects.
This only occurs when you want to UPSERT rows into your Target database, so it is not a MySQL vs. Oracle issue. Wanted to point this out, because this is a feature of this tool that is not very well documented (currently).
If your Target database tables already contain rows, ETLAlchemy will try to UPSERT rows: it will insert rows that DNE, and it will update rows that do exist. It uses the primary key value to check as a unique index to check for existence (or lack of) of a row, and does not support the UPSERT migration if a table does not contain Primary Keys.
Could you validate that this is working, and close this issue if this has fixed the problem?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Well it doesn't break anymore, but it just creates an empty table in MySQL. The table has the PK, FK and indexes transferred to from Oracle to MySQL, but not data. ETLAlchemy exports all the data to CSV and it says it writes it to target DB but it's not. I get no errors.
Attached is my output
The 2 runs you describe (first one breaks, second one runs) are running in two different states. The first run outlined in the Issue description, is UPSERTING rows into a table that already exists on the target host. Your recent run ( that failed to migrate data ) is INSERTING rows, because the table 'xxx' does not exist on the target database.
That said, no reason comes to mind regarding why the data wouldn't be loading. I would comment-out line 736 in ETLAlchemySource.py (os.remove(data_file_path)
), so you can view the contents of the file after ETLAlchemy dumps the MySQL statements into it.
Could you do that, and attach the file here?
I commented line 736 and ran it. The csv file looks fine. I am not able to upload it though as it contains confidential information. That's why I had to modify my post ie table name 'xxx'. I am doing this for my job and the material is sensitive.
It still did not load the data.
I think I know why. You are using mysqlimport and this is a corporate MySQL server. I do not have ssh privs on this server so I can't access the host mysql shell. I can only issue a LOAD DATA INFILE through a client.
I did notice that ETLAlchemy is issuing this optionally enclosed-by="'" , but the file has the columns enclosed by double quotes
I broke this down to a couple of elements and narrowed it down to the local_infile option not being set to 1 when using the script. I tested with SQL Alechemy and I couldn't execute a LOAD DATA LOCAL INFILE on the CSV the ETLAlchemy downloaded. When I added ?local_infile=1 to the MySQL connection string after /db (/db?local_infile=1) I was able to execute the command and the data loaded. However, when I add it to my ETLAlchemyTarget connection string it does nothing and still can't load the file. I'm pretty confident this is the issue. Just need to figure out how to make ETLAlchemyTarget use the parameter
It won't matter if you set local_infile=1
on ETLAlchemyTarget's connection string, because ETLAlchemy does not run LOAD DATA LOCAL INFILE
through the SQLAlchemy engine. It runs the mysqlimport
command line tool to load the data into the Target. It uses this tool for a number of reasons, one of them being the issue you laid out above with pesky connection parameters when running LOAD DATA LOCAL INFILE..
Can you try to run the mysqlimport
utility to load the file ? Please let me know what errors you get when you try this (an upload of the command's output would be great). You can copy the command from ETLAlchemySource line 646.
By the way, this utility can fail silently b/c I am running it via os.system()
. I need to update this to subprocess.run()
so I can catch exceptions.
Were you able to load the file using the mysqlimport
utility? If not, can you send me the errors that you get?
@TBusen you mentioned before that
I did notice that ETLAlchemy is issuing this optionally enclosed-by="'" , but the file has the columns enclosed by double quotes
This has been resolved in commit: https://github.com/seanharr11/etlalchemy/commit/746b410975cefe242357078fbe7b1ba596dc4bcf
To my previous comment, were you able to import the file via mysqlimport
?
ETLAlchemySource (INFO) -
ETLAlchemySource (INFO) - Reading Table Schema 'xxx_stat'... ETLAlchemySource (INFO) - Loaded indexes and FKs for table 'xxx_stat' ETLAlchemySource (INFO) - Building query to fetch all rows from xxx_stat ETLAlchemySource (INFO) - Done. (4 total rows) ETLAlchemySource (INFO) - Loading all rows into memory... ETLAlchemySource (INFO) - Done ETLAlchemySource (INFO) - (xxx_stat) VARCHAR ETLAlchemySource (INFO) - Bases: ['STRING'] ETLAlchemySource (INFO) - Converting to -> VARCHAR(2 (maxsize: 1) ETLAlchemySource (INFO) - Checking column for elimination status... ETLAlchemySource (INFO) - (xxx_desc) VARCHAR ETLAlchemySource (INFO) - Bases: ['STRING'] ETLAlchemySource (INFO) - Converting to -> VARCHAR(8 (maxsize: 8) ETLAlchemySource (INFO) - Checking column for elimination status... ETLAlchemySource (WARNING) - Table '{0}' already exists - not creating table, reflecting to get new changes instead.. ETLAlchemySource (INFO) - Transforming & Dumping 4 total rows from table 'xxx_stat' into 'C:\Users\busenta\Documents/xxx_stat.sql'. ETLAlchemySource (INFO) - (xxx_stat) -- Transforming rows: 0 -> 4...(4 Total) ETLAlchemySource (INFO) - (xxx_stat) -- Dumping rows: 0 -> 4 to 'xxx_stat.sql'...(4 Total)[Table 0/1] ETLAlchemySource (INFO) - Gathering unique columns for upsert. ETLAlchemySource (INFO) - Unique columns are '[Column('xxx_stat', VARCHAR(length=2), table=, primary_key=True, nullable=False)]'
ETLAlchemySource (INFO) - Creating 'upsert' statements for '4' rows, and dumping to 'xxx_stat.sql'.
TypeError Traceback (most recent call last)