vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 590 forks source link

How to split a file after read by VAEX in python. #2300

Open tanwar-manish opened 1 year ago

tanwar-manish commented 1 year ago

I do this by pandas and want to do the same as below by vaex.

df= pd.read_csv(r"D:\Personal Projects\Testing_file.txt")

df['First'] = df['address'].str[:4] df['Second'] = df['address'].str[4:]

new = df[['First', 'Second']] aaa = new.groupby('First')['Second'].apply(list) dff = aaa.to_frame() newDf = dff.transpose()

JovanVeljanoski commented 1 year ago

You didn't attach your dataset so I used an example one:

import vaex

df = vaex.datasets.titanic()

df['First'] = df['name'].str.slice(0, 4)
df['Second'] = df['name'].str.slice(4, None)

new = df[['First', 'Second']]

aaa = new.groupby('First').agg({'Second': vaex.agg.list('Second')})

Transpose is currently not possible. I think we can do it with some tricks, but it is technically challenging. It would require some work so if we get funding or other support for this we can look into it. PRs are always welcome tho