Closed tohariahsan closed 2 months ago
Can you explain what the issue is?
My script is
df_full = pl.DataFrame()
for df in pl.read_database(
query=query,
connection='DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db_name + ';UID=' + db_username + ';PWD=' + db_password,
execute_options={'max_text_size': 512, 'max_binary_size': 1024},
iter_batches=True,
batch_size=5000,
schema_overrides=schema,
):
df_full = df_full.vstack(df)
I get an error like this:
for df in pl.read_database(
^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\polars\io\database\function
s.py", line 238, in read_database
_ = import_optional(
^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\polars\dependencies.py", li
ne 280, in import_optional
raise ModuleNotFoundError(err_message) from None
ModuleNotFoundError: use of ODBC connection string requires the 'arrow_odbc' pac
kage.
Please install using the command `pip install arrow_odbc`.
Even though I have installed arrow-odbc/arrow_odbc and pyarrow, as shown below:
C:\Users\Administrator>pip list
Package Version
--------------------- ------------
adbc-driver-manager 1.1.0
adbc-driver-sqlite 1.1.0
annotated-types 0.7.0
arrow 1.3.0
arrow-odbc 7.0.4
babel 2.16.0
bleach 6.1.0
bokeh 3.4.3
certifi 2024.7.4
cffi 1.17.0
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorama 0.4.6
colorcet 3.1.0
commonmark 0.9.1
connectorx 0.3.3
contourpy 1.2.1
cycler 0.12.1
deltalake 0.19.0
et-xmlfile 1.1.0
fastexcel 0.11.5
fonttools 4.53.1
fsspec 2024.6.1
gevent 24.2.1
great-tables 0.10.0
greenlet 3.0.3
holoviews 1.19.1
htmltools 0.5.3
hvplot 0.10.0
idna 3.7
importlib_metadata 8.2.0
importlib_resources 6.4.3
Jinja2 3.1.4
kiwisolver 1.4.5
linkify-it-py 2.0.3
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdit-py-plugins 0.4.1
mdurl 0.1.2
mmh3 4.1.0
nest-asyncio 1.6.0
numpy 2.0.1
openpyxl 3.1.5
packaging 24.1
pandas 2.2.2
panel 1.4.5
param 2.1.1
pillow 10.4.0
pip 24.2
polars 1.5.0
pyarrow 17.0.0
pycparser 2.22
pydantic 2.8.2
...
and show_versions() shown like this
--------Version info---------
Polars: 1.5.0
Index type: UInt32
Platform: Windows-2012ServerR2-6.3.9600-SP0
Python: 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC
v.1938 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager: 1.1.0
cloudpickle: 3.0.0
connectorx: 0.3.3
deltalake: <not installed>
fastexcel: <not installed>
fsspec: 2024.6.1
gevent: 24.2.1
great_tables: 0.10.0
hvplot: 0.10.0
matplotlib: 3.9.2
nest_asyncio: 1.6.0
numpy: 2.0.1
openpyxl: 3.1.5
pandas: 2.2.2
pyarrow: <not installed>
pydantic: 2.8.2
pyiceberg: 0.7.0
sqlalchemy: 2.0.32
torch: <not installed>
xlsx2csv: 0.8.3
xlsxwriter: 3.2.0
Can you run the following from within the same environment where you're running Polars? Under the hood this is all we're doing there 🤔
from importlib import import_module
import_module("arrow_odbc")
When you are calling pip list
you're showing what's installed for the global Python interpreter - if you are running Polars in a different environment (eg: in a standard venv, with conda, or otherwise) you may not have access to the same packages, which is why you don't see them in the output from pl.show_versions
.
If you install your packages into the correct environment/venv I'd expect your error to resolve. I don't think there's any bug on our side here.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python312\Lib\importlib\__init__.py", line 90, in impor
t_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "C:\Program Files\Python312\Lib\site-packages\arrow_odbc\__init__.py", li
ne 2, in <module>
from .reader import BatchReader, read_arrow_batches_from_odbc
File "C:\Program Files\Python312\Lib\site-packages\arrow_odbc\reader.py", line
4, in <module>
import pyarrow
File "C:\Program Files\Python312\Lib\site-packages\pyarrow\__init__.py", line
65, in <module>
import pyarrow.lib as _lib
ImportError: DLL load failed while importing lib: The specified procedure could
I will close this as this is an environment problem, not a Polars problem.
Checks
Reproducible example
Log output
No response
Issue description
I have installed pyarrow package but not recognized in polars
Expected behavior
Installed versions