vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.31k stars 591 forks source link

[FEATURE-REQUEST] #1446

Closed Leo-Ji2020 closed 3 years ago

Leo-Ji2020 commented 3 years ago

I have a request: After groupby a vaex dataframe, how can I get the first row of each group to build a new vaex dataframe? for example:

 import numpy as np
 import vaex

 x=np.random.randint(1, 5, 20)
 df = vaex.from_arrays(x=x, y=x**2)

 dff = df.groupby(df.x)

 now, I want to concat the first row of each group to build a new dataframe. how to do that?
kmcentush commented 3 years ago

This will actually have a similar solution to https://github.com/vaexio/vaex/issues/1448, so I recommend you follow that issue in addition to this one.

JovanVeljanoski commented 3 years ago

Indeed, like @kmcentush said.

Anyway here is a belated example

import vaex

df = vaex.example()
g = df.groupby('id')

dfs = []
for _, df_tmp in g:
    dfs.append(df_tmp[:1]) 

df2 = vaex.concat(dfs)
df2 = df2.trim()  # this will make things faster.