vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.27k stars 590 forks source link

repeated concatanation of dfs converts some values to nan #2345

Open san-vak opened 1 year ago

san-vak commented 1 year ago

Hi!

thank you for the fast! and great library

i am working with a large dataset and i was appending processed data to a pandas dataframe and then trying to save it into a dataframe. i was trying to do so in several steps before the pandas dataframe gets too big for memory and slow down the process, so i converted the df into vaex df then exported, for the next iteration i opened the previous stored vaex df, used vaex.concat to merge it with new df then exported the df in the path of the previous df.

after some iteration some column values of some chunk of rows converted to nan.

new_vaex_df = vaex.from_pandas(pd_df) old_df = vaex.open(old_path) old_df = vaex.concat([old_df, new_vaex_df]) old_df.export(old_path)

i tried to export the new_vaex_df then using vaex.open_many() to concatenate the dfs, it resulted in the same issue.

thank you for your time and attention