neuniversity / ALY6140

1 stars 3 forks source link

Sort Data frame after applying groupby #37

Open heena007 opened 5 years ago

heena007 commented 5 years ago

So, I am trying to sort my data frame after doing a group by but until I apply some aggregate function and do a reset index, it doesn't work. Does anyone has any explanation for this strict need of aggregation or any other method to sort data on a column after groupby?

Thanks

ZelingJiang commented 5 years ago

I had that problem too. I can't exactly remember which command will change the data type. Probabily reset index function. Try the commands below, and see if it works. (Those are what I used to rank data frame) data.groupby('column name',as_index = False) data.sort_values(by=['column name'])

heena007 commented 5 years ago

Thanks Zeling, I tried the commands. It gives me the following error:

Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects, try using the 'apply' method Any suggestions?

CHENGYULIU1 commented 5 years ago

could you copy the code or make a screenshot here? I may able to help if I can see the code!

echolq012 commented 5 years ago

Hi Heena, I am not sure what kind of problems you are facing, maybe the data type are not consistent. If you want to count the number of frequency, I suggest you .size() after you have used groupby function. Besides, I think maybe lambda function may help you solve that problem. You can combine apply function and lambda function together.

Best, Qing Li

Jin-pengSong commented 5 years ago

I am not sure what kind of problems you have. However, while I want to sort data, I use sort_values to deal with my data. Whether this help you?

pr24 commented 5 years ago

Heena I have used the sort value in my dataset. Like this: Varible.groupby('colname') sort_val(colname,ascending = False)

heena007 commented 5 years ago

So, what I wanted to do was use order parameter in the countplot function for a dataset.

for eg:

import seaborn as sns

result = crime_data_2008_2018.groupby(["Reported Year"]).agg(np.median).reset_index().sort_values('Reported Year')

sns.countplot(x='Reported Year', data=crime_data_2008_2018, order=result['Reported Year'], palette="Set2")

I found the above command online and it is showing me the desired results, I was just wondering , if there is any other way of doing it. Also, I didn't really understand what is the use of "agg(np.median)" in the command.

Sorry for the confusion that might have been caused because of the verbiage of first post.

ThatkidfromA commented 5 years ago

Hi,

After reading your codes, I think your placement of arguments is incorrect. You should try following this coding: "df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)"

The argument place reset_index the end of argument.