pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.86k stars 18.01k forks source link

BUG: sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) #55286

Open djpirra opened 1 year ago

djpirra commented 1 year ago

Pandas version checks

Reproducible Example

df = pd.read_csv(<csv file>,  header=0)
newColumnNames = [col.upper() for col in df.columns]
df.columns = newColumnNames

engine = create_engine(
        f'oracle+oracledb://:@',
        connect_args={
            "user": adb_username,
            "password": adb_password,
            "dsn": connStr
        })

dtyp = {c:types.VARCHAR(200) for c in df.columns.tolist()}         
df.to_sql(name=table_name, schema=adb_username, con=engine, if_exists=if_exists, dtype = dtyp, index=False, method='multi')

Issue Description

sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(oracle+oracledb://:***@): (TABLE_NAME) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:713) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:695) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

Expected Behavior

The objective is to write to a table where its name and column names are all in upper case. The table is created successfully but it fails at data insert. It would be expected for it to complete the operation successfully.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0f437949513225922d851e9581723d82120684a6 python : 3.8.13.final.0 python-bits : 64 OS : Linux OS-release : 5.15.90.1-microsoft-standard-WSL2 Version : #1 SMP Fri Jan 27 02:56:13 UTC 2023 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 2.0.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 62.1.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2023.9.2 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.21 tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
paulreece commented 1 year ago

As this issue is presented, I'm not sure yet if this is actually an issue with pandas or one of the other services you're using( apache spark or sqlalchemy).

Can you provide us with more details like what the DataFrame looks like that you're trying to insert? Your repro script is a little vague and I don't have oracle so I can't test this locally. I have a feeling this is more of an issue with oracle+oracledb than pandas but I need more info to confirm or deny that.

handreassa commented 2 months ago

Hi @djpirra. I encountered a similar issue with the Snowflake database and, after some research, was able to resolve it. Could you please try running the following code change and see if it resolves the issue on your end?

df = pd.read_csv(<csv file>,  header=0)
newColumnNames = [col.upper() for col in df.columns]
df.columns = newColumnNames

engine = create_engine(
        f'oracle+oracledb://:@',
        connect_args={
            "user": adb_username,
            "password": adb_password,
            "dsn": connStr
        })

dtyp = {c:types.VARCHAR(200) for c in df.columns.tolist()}         
df.to_sql(name=table_name.lower(), schema=adb_username, con=engine, if_exists=if_exists, dtype = dtyp, index=False, method='multi')

In my environment, I noticed that to_sql raises issues when the table name is uppercased in the parameters, but when executed, the table name appears uppercased in the database.

This behavior seems to affect multiple databases, as discussed in the following references:

Let me know if this works or if further adjustments are needed!