Open aldanor opened 10 years ago
@aldanor on the only
keyword you mentioned, this is already filed as #7396.
For passing a MetaData
object to the read_sql_query/table
functions, this was implemented partially in the past, but removed from the functional API (see discussion here: https://github.com/pydata/pandas/issues/6300#issuecomment-43001653 and commit https://github.com/pydata/pandas/commit/ad97d2740201f6b7687dfccebd85cc348e8bfbe3) to keep that as simple as possible for a first release (easier to add than to remove keywords). In any case, it is available in the OO API with PandasSQLAlchemy
.
So basically, we just have to decide if we want to provide this functionality in both (read_*
functions and object-interface), or only via the object (so that when you need this functionality, you should use the OO API).
@jorisvandenbossche It may be just me, but I think it might make sense to allow the metadata object to be passed to the higher-level functions (or a set of arbitrary kwargs for reflect since there may be dialect-specific options) for those functions to be useful in setups more involved than the most basic ones.
In some environments (e.g. a typical corporate Oracle setup), you would absolutely have to pass the schema
argument for reflect to work. And you may want to pass only
in case your schema hosts ten thousand tables.
What would be the most common things you would want to adapt in the meta object? Of course the schema
, but are there also other things? Can you give some examples?
Just asking as if the schema
argument would be used a lot, I think we should provide it as a real keyword and not only through passing a MetaData
object (most users should not start adapting a MetaData object themselves, so if a lot of users need the schema argument, we should provide it seperately of meta=..
).
@mangecoeur @danielballan @hayd
@jorisvandenbossche Do we consider the only
argument being a special case here (as in, don't query the entire schema in read_sql
functions)? As for the schema
being a separate keyword, should the users be able to pass meta
as well, so the function signature would contain schema=None, meta=None
? If that doesn't clutter the API too much, that'd be nice.
As for myself, I sometimes have to use oracle_resolve_synonyms=True
in MetaData#reflect
or Table#__init__
when working with dblinks. I guess there exist a bunch of other useful dialect-specific options that I'm not aware of.
As I said above, the only
keyword is already discussed in #7396, and it will be used by default (reflect only the database you are reading, and none for read_sql_query) so no seperate keyword needed I think? If you have further comments or remarks on that, you can comment there.
For the API, that seems a good option. Another one would be to provide a way to pass options directly to reflect
with something like reflect_kwds={...}
, but that seems a bit more ugly (and limits it to reflect).
Passing things to Table
I think is too advanced for read_sql
, but maybe we can see how we can do this in the OO API.
Would you be interested to do a PR to add this?
I'm by no means sqlalchemy expert, so if someone more skilled in this would do it, it would be a better idea :) Elsewise I can give it a try on the weekend, will have to figure out how to set up test environment etc for the sql test suite then.
Re: API, schema
and meta
keywords look fine, what about the only
keyword when you pass the meta in? In this case the user is fully responsible for reflecting whatever he needs, right?
@jorisvandenbossche Would someone help me out in setting up a test environment so that I can run all tests in test_sql.py
without skipping? I mean usernames, passwords, databases, schemas for both mysql and postgresql. It's quite confusing by just looking at the code and it may make sense to put intitialization scripts somewhere in the test suite to make the whole thing more accessible to other developers (kind of like this: https://bitbucket.org/anthony_tuininga/cx_oracle/src/dcc80926aa86d524f36500a27a658d3b4c14cbb1/test/SetupTest.sql?at=default).
@aldanor that would close #7091
An overview:
root@localhost/pandas_nosetest
-> database 'pandas_nosetest' with username 'root' and blank password, on host 'localhost'postgres@localhost/pandas_nosetest
-> database 'pandas_nosetest' with username 'postgres' and blank password, on host 'localhost'So no special schema or something, I think for both default values when you would create such a database locally.
They are created here https://github.com/pydata/pandas/blob/master/.travis.yml#L99 with mysql -e 'create database pandas_nosetest;'
If you want tro try to set-up something like #7091 or what you linked to, certainly try! If we support specifying schema's, will will also have to test this, so the database set-up will have to be more advanced.
@aldanor Did it work out to look at this?
@jorisvandenbossche status?
I have unfortunately no time this week to do it myself, so pushing to 0.15
Example usages:
@aldanor I started implementing this, see #7952. For now only a schema
support. Could you try this out to see if this solves your basic problems reported here?
This is partly implemented (schema kwarg, possibility to pass meta to SQLDatabase), but pushing to 0.15.1 to see if we want to provide more.
I'm resurrecting an ancient issue here, but is there any interest in including schema
as an optional kwarg in read_sql
? The read_sql
wrapper is a convenient one, but the inability to pass a schema value to be used by read_sql_table
limits the usefulness of this method in practice. I have a WIP PR adding this in, but it looks like this has been debated already...?
Anyways, it's an addition that I would find very useful, but I am new to contributing to Pandas, so someone feel free to please set me straight here. 🙏
Use case: I want to query an Oracle table but I'm not the owner, so
meta.reflect(engine)
would do nothing and always return an empty list of tables renderingread_sql_table
useless. This works:meta.reflect(engine, schema='the_real_owner')
, however there's no way of passing the metadata / schema to the high-level pandas.io.sql functions. One may also want to use ameta.reflect(engine, only=[...])
ormeta.reflect(engine, oracle_resolve_synonyms=True)
or any other dialect-specific argument.PandasSQLAlchemy
class already support passingmeta
in the constructor, so it's just a matter of a adding an extra keyword argument to the three io.sql.read_sql* functions. Or maybe allow passing through arbitrary kwargs to meta's reflect method, likeread_sql_table(table, engine, reflect=dict(schema='my_schema', oracle_resolve_synonyms=True))
-- ugly but functional.