kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
890 stars 103 forks source link

Issue arranging data after summarising with a new variable #93

Open Ignaciovf opened 4 years ago

Ignaciovf commented 4 years ago

Hello, I'm running the following code:

test=(diamonds>>
    group_by(X.cut)>>
    summarise(dis=n_distinct(X.price))>>
    arrange(X.dis,ascending=False))

I'm trying to create a new variable with the Summarise function and arrange according to it afterwards, but although the result is correct and no warnings or errors pop up, the output isn't arranged by my new variable.

Thank you very much

danielsjf commented 4 years ago

I had the same problem. Not yet sure why but the following fixed it for me. I think the dataframe is not correctly formatted.

test = diamonds >> \
    group_by(X.cut) >> \
    summarise(dis=n_distinct(X.price))
test = pd.DataFrame(test) >> arrange(X.dis,ascending=False))

Watch out if you use count as variable as that was also causing me a headache. Count is a special function so X.count doesn't refer to the column name.

More information also in #56