vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

Lazy CSV reading improvement: auto-detect types #2224

Closed JovanVeljanoski closed 2 years ago

JovanVeljanoski commented 2 years ago

Sometimes when there are lots of missing values, the process crashes, since types can not be inferred. We can go around this by providing relevant types via ConvertOptions. It would be much nicer if we can improve the automatic detection of types.

Checklist:

Note: I am not 100% sure that the problem exposed by this unit-test is due to failed inference of types. It is more of an assumption on my end.