Cannot handle indexed columns with uppercase characters

RodrigoCR commented 7 years ago

I'm trying to copy data from a MS SQL server to a MYSQL server.

I'm following the standard usage example, just adding the included_tables parameter (to 1 table so I can test how much time and resources I will need to copy the remaining tables).

Everything seems to run fine (reading the schema, loading indexes and building query) but almost at the end of the process I get the following error:

ETLAlchemySource (INFO) - Unique columns are '[Column('IdCliente', INTEGER(display_width=11), table=<Cliente>, primary_key=True, nullable=False)]'
ETLAlchemySource (INFO) - Creating 'upsert' statements for '100000' rows, and dumping to 'Cliente.sql'.
Traceback (most recent call last):
  File "copy_with_etl.py", line 11, in <module>
    target.migrate()
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemyTarget.py", line 86, in migrate
    migrate_data=migrate_data)
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemySource.py", line 1140, in migrate
    pks, Session)
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemySource.py", line 840, in dump_data
    uid += str(row[self.current_ordered_table_columns.index(pk)])
ValueError: u'idcliente' is not in list

Why is it happening that the unique columns list is not empty but when reading, it tries to find a string and not an object?

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/47870567-cannot-handle-indexed-columns-with-uppercase-characters?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github).

simnim commented 7 years ago

The tool breaks when there are upper case characters in column names.

I started debugging the tool today for the same reason. The tool lower cases all the pk column names here ETLAlchemySource.py#L1034 but only has the original case column names in the self.current_ordered_table_columns. So naturally the index(pk) fails.

RodrigoCR commented 7 years ago

@simnim so removing .lower() should work? Cause I can't change the origin table column names.

seanharr11 commented 7 years ago

The lower() calls were a quick-bandaid-fix related to case-sensitivity across databases.

For instance, take the following example when Oracle (Target) has case insensitive column names, and SQL Server (Source) has case sensitive column names.

Image what happens when you move the column IdCliente from SQL Server to Oracle, given your suggestion to remove lower() above. As far as ETLAlchemy knows, the column remains IdCliente in all the references throughout the tool. But....

Once the schema gets moved over, the column becomes just idcliente in the source's MetaData() when reflected via SQLAlchemy.

Now, when you try to move the data between the sources, there is no automagical mapping from source -> target (b/c the columns have different names!). The tool will try to find IdCliente in the table, but it does not exist with an uppercase "I" and "C". This was the reason for coercing columns to lowercase.

Solution 1: The easy solution would be to either fix the description of this issue, and ensure that the self.current_ordered_table_columns list is a list of lowercase columns.

Solution 2: The harder, and better solution is to write a wrapper around the various calls to SQLAlchemy's MetaData.tables['table_name'].columns.get('columnName') function to do a *case-insensitive search for a column in the MetaData. This way, we isolate the DRY out logic to one place in the code, while still supporting case sensitivity between 2 databases that are both case-sensitive (like MySQL and SQL Server).

Make sense? Thoughts?

original-heart commented 6 years ago

@seanharr11 how to set the table name with uppercase?

seanharr11 / etlalchemy

Cannot handle indexed columns with uppercase characters #25