pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.36k stars 1.96k forks source link

Dependencies not recognized #18247

Closed tohariahsan closed 2 months ago

tohariahsan commented 2 months ago

Checks

Reproducible example

D:\>pip install "polars[pyarrow]"
Requirement already satisfied: polars[pyarrow] in c:\program files\python312\lib\site-packages (1.5.0)
Requirement already satisfied: pyarrow>=7.0.0 in c:\program files\python312\lib\site-packages (from polars[pyarrow]) (17.0.0)
Requirement already satisfied: numpy>=1.16.6 in c:\program files\python312\lib\site-packages (from pyarrow>=7.0.0->polars[pyarrow]) (2.0.1)

D:\>pip install pyarrow
Requirement already satisfied: pyarrow in c:\program files\python312\lib\site-packages (17.0.0)
Requirement already satisfied: numpy>=1.16.6 in c:\program files\python312\lib\site-packages (from pyarrow) (2.0.1)

Log output

No response

Issue description

I have installed pyarrow package but not recognized in polars

Expected behavior

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Windows-2012ServerR2-6.3.9600-SP0
Python:               3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC
v.1938 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  1.1.0
cloudpickle:          3.0.0
connectorx:           0.3.3
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.1
gevent:               24.2.1
great_tables:         0.10.0
hvplot:               0.10.0
matplotlib:           3.9.2
nest_asyncio:         1.6.0
numpy:                2.0.1
openpyxl:             3.1.5
pandas:               2.2.2
pyarrow:              17.0.0
pydantic:             2.8.2
pyiceberg:            0.7.0
sqlalchemy:           2.0.32
torch:                <not installed>
xlsx2csv:             0.8.3
xlsxwriter:           3.2.0

Installed versions

``` --------Version info--------- Polars: 1.5.0 Index type: UInt32 Platform: Windows-2012ServerR2-6.3.9600-SP0 Python: 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: 1.1.0 cloudpickle: 3.0.0 connectorx: 0.3.3 deltalake: fastexcel: fsspec: 2024.6.1 gevent: 24.2.1 great_tables: 0.10.0 hvplot: 0.10.0 matplotlib: 3.9.2 nest_asyncio: 1.6.0 numpy: 2.0.1 openpyxl: 3.1.5 pandas: 2.2.2 pyarrow: pydantic: 2.8.2 pyiceberg: 0.7.0 sqlalchemy: 2.0.32 torch: xlsx2csv: 0.8.3 xlsxwriter: 3.2.0 ```
ritchie46 commented 2 months ago

Can you explain what the issue is?

tohariahsan commented 2 months ago

My script is

    df_full = pl.DataFrame()
    for df in pl.read_database(
        query=query,
        connection='DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db_name + ';UID=' + db_username + ';PWD=' + db_password,
        execute_options={'max_text_size': 512, 'max_binary_size': 1024},
        iter_batches=True,
        batch_size=5000,
        schema_overrides=schema,
    ):
        df_full = df_full.vstack(df)

I get an error like this:

    for df in pl.read_database(
              ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\polars\io\database\function
s.py", line 238, in read_database
    _ = import_optional(
        ^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\polars\dependencies.py", li
ne 280, in import_optional
    raise ModuleNotFoundError(err_message) from None
ModuleNotFoundError: use of ODBC connection string requires the 'arrow_odbc' pac
kage.
Please install using the command `pip install arrow_odbc`.

Even though I have installed arrow-odbc/arrow_odbc and pyarrow, as shown below:

C:\Users\Administrator>pip list
Package               Version
--------------------- ------------
adbc-driver-manager   1.1.0
adbc-driver-sqlite    1.1.0
annotated-types       0.7.0
arrow                 1.3.0
arrow-odbc            7.0.4
babel                 2.16.0
bleach                6.1.0
bokeh                 3.4.3
certifi               2024.7.4
cffi                  1.17.0
charset-normalizer    3.3.2
click                 8.1.7
cloudpickle           3.0.0
colorama              0.4.6
colorcet              3.1.0
commonmark            0.9.1
connectorx            0.3.3
contourpy             1.2.1
cycler                0.12.1
deltalake             0.19.0
et-xmlfile            1.1.0
fastexcel             0.11.5
fonttools             4.53.1
fsspec                2024.6.1
gevent                24.2.1
great-tables          0.10.0
greenlet              3.0.3
holoviews             1.19.1
htmltools             0.5.3
hvplot                0.10.0
idna                  3.7
importlib_metadata    8.2.0
importlib_resources   6.4.3
Jinja2                3.1.4
kiwisolver            1.4.5
linkify-it-py         2.0.3
Markdown              3.7
markdown-it-py        3.0.0
MarkupSafe            2.1.5
matplotlib            3.9.2
mdit-py-plugins       0.4.1
mdurl                 0.1.2
mmh3                  4.1.0
nest-asyncio          1.6.0
numpy                 2.0.1
openpyxl              3.1.5
packaging             24.1
pandas                2.2.2
panel                 1.4.5
param                 2.1.1
pillow                10.4.0
pip                   24.2
polars                1.5.0
pyarrow               17.0.0
pycparser             2.22
pydantic              2.8.2
...

and show_versions() shown like this

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Windows-2012ServerR2-6.3.9600-SP0
Python:               3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC
v.1938 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  1.1.0
cloudpickle:          3.0.0
connectorx:           0.3.3
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.1
gevent:               24.2.1
great_tables:         0.10.0
hvplot:               0.10.0
matplotlib:           3.9.2
nest_asyncio:         1.6.0
numpy:                2.0.1
openpyxl:             3.1.5
pandas:               2.2.2
pyarrow:              <not installed>
pydantic:             2.8.2
pyiceberg:            0.7.0
sqlalchemy:           2.0.32
torch:                <not installed>
xlsx2csv:             0.8.3
xlsxwriter:           3.2.0
alexander-beedie commented 2 months ago

Can you run the following from within the same environment where you're running Polars? Under the hood this is all we're doing there 🤔

from importlib import import_module
import_module("arrow_odbc")

When you are calling pip list you're showing what's installed for the global Python interpreter - if you are running Polars in a different environment (eg: in a standard venv, with conda, or otherwise) you may not have access to the same packages, which is why you don't see them in the output from pl.show_versions.

If you install your packages into the correct environment/venv I'd expect your error to resolve. I don't think there's any bug on our side here.

tohariahsan commented 2 months ago
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python312\Lib\importlib\__init__.py", line 90, in impor
t_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "C:\Program Files\Python312\Lib\site-packages\arrow_odbc\__init__.py", li
ne 2, in <module>
    from .reader import BatchReader, read_arrow_batches_from_odbc
  File "C:\Program Files\Python312\Lib\site-packages\arrow_odbc\reader.py", line
 4, in <module>
    import pyarrow
  File "C:\Program Files\Python312\Lib\site-packages\pyarrow\__init__.py", line
65, in <module>
    import pyarrow.lib as _lib
ImportError: DLL load failed while importing lib: The specified procedure could
ritchie46 commented 2 months ago

I will close this as this is an environment problem, not a Polars problem.