vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

[BUG-REPORT] Problems reading a parquet file #2177

Open ArtemkaDS opened 2 years ago

ArtemkaDS commented 2 years ago

Description got the error, trying to read parquet file:

Cannot open part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet nobody knows how to read it.

Any thoughts how to fix it?

Software information

JovanVeljanoski commented 2 years ago

Can you open it with pandas?

n0k0m3 commented 2 years ago

Same issue, opens nicely with pandas, can't open with vaex

JovanVeljanoski commented 2 years ago

Can anyone provide an example of how that file was generated?

Edit: Or an example of how to generate a small such parquet file with some random data for testing - which vaex has troubles opening.

I tried this:

import vaex
import numpy as np

df = vaex.example().to_pandas_df()
df.to_parquet('part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet', compression='snappy')
vaex.open('part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet')

which works just fine.