Open iamkucuk opened 2 years ago
Hi, I'm reading the docs. I'm not sure why pyarrow succeeds, when they seem to have dynamic libraries included.
I think that parquet-python
is an earlier version of fastparquet that is pure Python: https://github.com/jcrobak/parquet-python
From the documentation, fastparquet
was forked from parquet-python
, with the aim of providing faster implementation (Cython, multi-threaded, etc).
I received an error when trying to install pyarrow, but that makes sense. However I was able to get fastparquet to install, at least according to pip, yet neither pandas nor dask can find it. I receive an error when listing it as the engine:
ImportError: Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.
Hi, it's the same issue: fastparquet has installed precompiled dynamic libraries (because it cannot make the difference between OSX on Arm architecture and iOS on Arm architecture). iOS cannot load these dynamic libraries, because of the hard security rules implemented.
The best way to load parquets file is probably to go back to an earlier package that does not have dynamic libraries, such as parquet-python
.
Thanks for the response. I have tried to install parquet-python but it appears to have an error when installing where Cython is trying and failing to assign an int as a double. I can open an issue with that project, but I wanted to check with you first to see if that was expected before I did.
I see the problem: parquet-python is pure Python, but it depends on thriftpy2, which itself has one Cython file. Since there are no Cython compilers, Carnets cannot install thriftpy2, and thus the install of parquet-python fails. I'm not certain how to fix that one. By editing thriftpy2/setup.py
, it could be possible to disable the compiling of the extension, but would it work afterwards? I don't know.
Hello.
Pandas requires pyarrow or fastparquet engines to read parquet files. Fastparquet installation fails when I try "!pip install fastparquet" and pyarrow succeeds. I can see the package had been installed with "!pip list". However, pandas still cannot utilize pyarrow and unable to read parquet files.
Any ideas?