pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.68k stars 17.92k forks source link

.read_sql_query works but .read_sql_table fails #11258

Closed jrandrews closed 9 years ago

jrandrews commented 9 years ago

Pandas is very helpful and I have been using it heavily recently. However, I have run into an issue where this code works:

from sqlalchemy import *
import pandas as pd
orcl_connect_string = 'oracle+cx_oracle://system:oracle@192.168.56.101:1521/?service_name=orcl'
engine = create_engine(orcl_connect_string)
pd.read_sql_query('SELECT * FROM INT_PROP', engine)

but this code fails:

from sqlalchemy import *
import pandas as pd
orcl_connect_string = 'oracle+cx_oracle://system:oracle@192.168.56.101:1521/?service_name=orcl'
engine = create_engine(orcl_connect_string)
read_sql_table('INT_PROP', engine)

Error message I get back:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/me/Google Drive/Me Projects/Pycharm/auto-profiler/profile_pandas.py"
Traceback (most recent call last):
  File "/Users/me/Google Drive/Me Projects/Pycharm/auto-profiler/profile_pandas.py", line 13, in <module>
    df = pd.read_sql_table(table_name_str, engine)
  File "/Library/Python/2.7/site-packages/pandas/io/sql.py", line 340, in read_sql_table
    raise ValueError("Table %s not found" % table_name)
ValueError: Table INT_PROP not found

I can't think of any other extenuating circumstances -- it's a table, not a view, the userid logging in has full rights to the table in question. I've also tried adding the schema name and I get the same error back.

Versions:

Mac OS X 10.11 El Capitan (recently upgraded but had the same issue on 10.10 Yosemite)
Oracle 12c
Python 2.7.6
SQLAlchemy 1.0.8
Pandas 0.16.2

Problem occurs whether running from within Pycharm CE or running at the terminal console with python.

jrandrews commented 9 years ago

Also a list of output from pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.7
Cython: None
numpy: 1.8.0rc1
scipy: 0.13.0b1
statsmodels: 0.6.1
IPython: None
sphinx: None
patsy: 0.4.0
dateutil: 1.5
pytz: 2013.7
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: None
jorisvandenbossche commented 9 years ago

Thanks for the report, an you try some of these things:

jrandrews commented 9 years ago

engine.has_table seems to work:

from sqlalchemy import *

orcl_connect_string = 'oracle+cx_oracle://system:oracle@192.168.56.101:1521/?service_name=orcl'
engine = create_engine(orcl_connect_string)

connection = engine.connect()

if engine.has_table(table_name='INT_PROP'):
    print 'Yes, has the table'
else:
    print 'No table'

Results: Yes, has the table I tried to test for metadata reflection. I am sorry, I might not understand exactly what you are asking for -- here is the code that I ran:

from sqlalchemy import *

orcl_connect_string = 'oracle+cx_oracle://system:oracle@192.168.56.101:1521/?service_name=orcl'
engine = create_engine(orcl_connect_string)

meta = MetaData(bind=engine)

print meta.reflect()

and the results:

None

I tried this too and what is interesting is that I didn't seem to get a full list of tables that this user has access to:

for t in meta.sorted_tables:
    print t.name

Results:

logmnr_attrcol$
logmnr_attribute$
logmnr_ccol$
logmnr_cdef$
logmnr_col$
logmnr_coltype$
logmnr_con$
logmnr_container$
logmnr_dictionary$
logmnr_dictstate$
logmnr_enc$
logmnr_icol$
logmnr_ind$
logmnr_indcompart$
logmnr_indpart$
logmnr_indsubpart$
logmnr_kopm$
logmnr_lob$
logmnr_lobfrag$
logmnr_logmnr_buildlog
logmnr_ntab$
logmnr_obj$
logmnr_opqtype$
logmnr_partobj$
logmnr_props$
logmnr_refcon$
logmnr_seed$
logmnr_subcoltype$
logmnr_tab$
logmnr_tabcompart$
logmnr_tabpart$
logmnr_tabsubpart$
logmnr_ts$
logmnr_type$
logmnr_user$
logmnrc_con_gg
logmnrc_concol_gg
logmnrc_gsba
logmnrc_gsii
logmnrc_gtcs
logmnrc_gtlo
logmnrc_ind_gg
logmnrc_indcol_gg
logmnrc_seq_gg
logmnrp_ctas_part_map
logstdby$apply_progress
movie1_stg
movie2_stg

What is interesting is that the last tables in that list are defined as external tables in Oracle. Those have some funny properties compared to normal tables. Has anybody ever tested using external tables in Oracle with SQLAlchemy or Pandas? I wasn't ever trying to select from the external tables, but I wonder if their presence in the metadata is already causing issues.

jorisvandenbossche commented 9 years ago

Well, read_sql_table internally uses this reflected meta object to get the table. So we should investigate why it is not included there (meta.tables), while engine.has_table does work.

You can also give echo=True to the engine creation (create_engine(..., echo=True)), to get some more debugging information on what is happening on the sqlalchemy calls.

jrandrews commented 9 years ago

So I think I have found the issue after setting echo=True for create_engine(). When SQLAlchemy queries the Oracle system catalog it filters out tables belonging to two users, specifically SYSTEM and SYSAUX. Here is the actual query:

SELECT table_name FROM all_tables WHERE nvl(tablespace_name, 'no tablespace') NOT IN ('SYSTEM', 'SYSAUX') AND OWNER = :owner AND IOT_NAME IS NULL AND DURATION IS NULL

Now, I am connecting as the SYSTEM user because this is running in a local virtual machine on my laptop which has no connectivity to the outside world. In any sort of real scenario this would be a bad idea, but in my scenario it isn't important. I copied the table to another user's schema and then ran the python script logging on as that user. read_sql_table now works. So it's an issue with SQLAlchemy silently hiding any tables belonging to SYSTEM or SYSAUX. I realize that this isn't a pandas problem. I looked at SQLAlchemy documentation and there is no mention in the Oracle dialect of SQLAlchemy doing this.

I can see why filtering out SYSTEM and SYSAUX tables on Oracle might be useful in some cases because there are a lot of them and they aren't helpful to the average user, but there should probably be a note or an indication and an option in the engine or connection constructor to allow for turning this behavior off. I can see certain scenarios where I might specifically want to access data in those tables.

jorisvandenbossche commented 9 years ago

OK, good you found it! I am closing this issue, as it is not something we can do anything about, but you can certainly raise the issue at sqlalchemy if you want (although I suppose they will have a good reason for it, but at least they can mention it in the docs).

n0mik0s commented 7 years ago

Hello! It's seems that I faced with the same error... I found that you could avoid "NOT IN ('SYSTEM', 'SYSAUX')" statement: sqlalchemy_engine = sqlalchemy.create_engine(sqlalchemy_con_str, case_sensitive=False, echo=True, encoding='latin1', **exclude_tablespaces=None**) And it's working as intended:

2017-02-20 20:49:34,833 INFO sqlalchemy.engine.base.Engine SELECT table_name FROM all_tables WHERE OWNER = :owner AND IOT_NAME IS NULL AND DURATION IS NULL

But in my case this option did not solve the issue:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/pandas/io/sql.py", line 257, in read_sql_table meta.reflect(only=[table_name], views=True) File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/schema.py", line 3815, in reflect (bind.engine, s, ', '.join(missing))) sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(oracle+cx_oracle://opc_op:***@ovm-hpomdb.sdab.sn:1521/?service_name=hpom): (OPC_ACT_MESSAGES)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/vetal/PycharmProjects/hpom_srv_stat/fetch_from_db.py", line 37, in p_df = pd.read_sql_table(str(table), sqlalchemy_engine).assign(CSVTimeStamp=current_tstamp) File "/usr/local/lib/python3.5/dist-packages/pandas/io/sql.py", line 259, in read_sql_table raise ValueError("Table %s not found" % table_name) ValueError: Table OPC_ACT_MESSAGES not found

Could anyone help me in that?

ChuliangXiao commented 4 years ago

I had the same issue. Now solved. Try table name in lower case. Thanks to Basel Issmail from SO.

thiagodma commented 4 years ago

Thank you, @ChuliangXiao. Your solution worked for me.

rherasm commented 3 years ago

Using read_sql_table I think there is a bug because it always makes the reflect call with 'views=True', this cause an exception 'NotImplementedError'.

(sql.py lin.269):

try:
        meta.reflect(only=[table_name], **views=True**)

schema.py lin.4671 returns method not implemented

(if views available.update(insp.get_view_names(schema))

Please, Do you know any way of fixing it? Thks!


PD: Ref. to the code, the method doesn't provide the views parameter

def read_sql_table( table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize: Optional[int] = None, ) -> Union[DataFrame, Iterator[DataFrame]]: