vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

[BUG-REPORT] Error converting from csv file to hdf5 file with #2202

Open zhiyongm opened 2 years ago

zhiyongm commented 2 years ago

Description ArrowInvalid: Failed casting from large_string to string Code: vaex.from_csv("/data/transactions.csv",convert=True,chunk_size=10000000) When I tried to call the from_csv function to convert the csv file to hdf5, each small hdf5 file was generated smoothly, but when aggregating each small file, an error occurred. I suspect if some of my fields are too long for vaex compatibility.

Software information

Additional information

JovanVeljanoski commented 2 years ago

Hi,

Can you please attach the example data, or paste it in code mode. I hope you don't expect us to copy that manually :)