seanharr11 / etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.
MIT License
555 stars 83 forks source link

ETLAlchemySource select schema #7

Open lucasdpn opened 7 years ago

lucasdpn commented 7 years ago

ETLAlchemySource allows me to select tables using included_tables parameter. If this table belongs to a schema, it is possible to export this table? I'm convert from psql to oracle.

now, I am creating tables on public schema to avoid setting schemas.

thanks.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/40931815-etlalchemysource-select-schema?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F41641218&utm_medium=issues&utm_source=github).
seanharr11 commented 7 years ago

Ideal Solution: I know that sqlalchemy supports selecting a schema when defining the MetaData object

metadata = MetaData(schema="my_postgresql_schema")

So in theory, ETLAlchemy could support this by adding the schema keyword to ETLAlchemySource's __init__() method, and then passing this attribute to the MetaData call that reflects the source database.

I image that this would transfer all tables, in all databases in the specified schema to the target database (Oracle). We just needs to add this feature.

An immediate solution: If you specify a database in the connection string, by default, all tables in the postgresql database will be transferred to oracle. Since a postgresql schema is made up of one or more database, you could run the tool across all databases, one at a time, to get your desired result.

seanharr11 commented 7 years ago

@lucasdpn does the above solution address the issue?

kehh commented 7 years ago

@seanharr11 I've just hit upon this issue for an oracle source where sqlalchemy is only reading from my user's own schema rather than the schema I need.

kehh commented 7 years ago

Also I'm not quite sure how to interpret your immediate solution. How do you specify the database in the connection string?

I'm using

dsn = cx_Oracle.makedsn("dns.example.com", 1521, sid="SIDSTRING")
src = ETLAlchemySource("oracle+cx_oracle://user:password@" + dsn)

Or by database do you mean dns.example.com from above?

ssarala commented 7 years ago

I'm hitting this issue for an Oracle source as well and I'm not sure how to use the immediate solution you suggested in that context. A solution would be of great help!

seanharr11 commented 7 years ago

@kehh @ssarala what I am suggesting would require a PR to expose the new "schema" feature. Basically, rather than moving the default schema (i.e. your user's schema) to the target DB, I am suggesting that ETLAlchemySource.migrate() function take a string parameter called schema to choose what schema to migrate.

In SQLAlchemy, when you create a MetaData() object, there is a parameter which you can pass to the __init__() function called schema. It is documented here

I am suggesting that we pass our top level schema parameter from ETLAlchemySource's migrate() method, down to where the metadata is reflected for the object: https://github.com/seanharr11/etlalchemy/blob/12461ee16046a2b7b703dfe7b86354877da16134/etlalchemy/ETLAlchemySource.py#L942.

We would also have to update the call to inspector.get_table_names to inspector.get_table_names(schema) here: https://github.com/seanharr11/etlalchemy/blob/12461ee16046a2b7b703dfe7b86354877da16134/etlalchemy/ETLAlchemySource.py#L899, so we iterate over tables in the provided schema.

By passing the schema to the metadata, i.e. MetaData(schema=schema), ETLAlchemySource will look in the schema provided for each table.

Looking for contributors, should be a relatively easy feature to implement...but we do need tests in place above all else .