pyinstaller / pyinstaller-hooks-contrib

Community maintained hooks for PyInstaller.
Other
94 stars 125 forks source link

pyarrow #739

Closed davetapley closed 6 months ago

davetapley commented 6 months ago

Which library is the hook for?

pyarrow

Have you gotten the library to work with pyinstaller?

Yes, but it needs several hidden imports.

Additional context

I see it's supposed to be support per ⬇️ so I'm not sure if this is a regression and should be fixed there, or all hooks go here now?

I need to --hidden-import pyarrow and pyarrow.vendored.version to get it working.

See also ⬇️ which mentions pyarrow.vendored.version.

rokm commented 6 months ago

If you need --hidden-import pyarrow, you likely have another package that uses pyarrow and needs to be hooked. What package is that?

davetapley commented 6 months ago

@rokm off the top of my head, maybe duckdb, but it is optional 🤔

Is there a better way to check? 🙏🏻

rokm commented 6 months ago

What does the error traceback look like if you don't add those hidden imports?

davetapley commented 6 months ago
  File "ng\core\cache\writer.py", line 16, in write_asset
    asset.to_parquet(path, index=False)
  File "pandas\core\frame.py", line 2973, in to_parquet
  File "pandas\io\parquet.py", line 483, in to_parquet
  File "pandas\io\parquet.py", line 189, in write
  File "pyarrow\\table.pxi", line 3869, in pyarrow.lib.Table.from_pandas
  File "pyarrow\pandas_compat.py", line 572, in dataframe_to_arrays
  File "pyarrow\pandas_compat.py", line 375, in _get_columns_to_convert
  File "pyarrow\\pandas-shim.pxi", line 199, in pyarrow.lib._PandasAPIShim.is_sparse
  File "pyarrow\\pandas-shim.pxi", line 200, in pyarrow.lib._PandasAPIShim.is_sparse
  File "pyarrow\\pandas-shim.pxi", line 116, in pyarrow.lib._PandasAPIShim._have_pandas_internal
  File "pyarrow\\pandas-shim.pxi", line 104, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow\\pandas-shim.pxi", line 57, in pyarrow.lib._PandasAPIShim._import_pandas
ModuleNotFoundError: No module named 'pyarrow.vendored.version'
[19020] Failed to execute script 'cli_main' due to unhandled exception!

That to_parquet is from pandas, with logic to use pyarrow AFAICT here.

Which I guess is why this doesn't find it?

FYI it will become required in pandas upcoming 3.0, if that makes a difference:

Since pandas hook is in pyinstaller repo, should I open there instead? 🤔

rokm commented 6 months ago

Hmm, based on the traceback, the following example should reproduce the problem when frozen:

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.to_parquet('test.par')

But it seems to work for me - can you test with your environment?

What version of python, PyInstaller, and pyinstaller-hooks-contrib are you using?

And also, what version of pyarrow?

davetapley commented 6 months ago

I could repro with that with:

pyinstaller==5.13.1
pyinstaller-hooks-contrib==2023.4

pandas==2.1.0
pyarrow==14.0.1

On Windows 11, if that matters.

rokm commented 6 months ago

Maybe it's time to update at least pyinstaller-hook-contrib? Your version does not have #662, so it's not surprising that pyarrow.vendored.version is not collected...

davetapley commented 6 months ago

Well that's embarrassing. Sorry for wasting your time 😞