kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

#group_by #mode #iter() #97

Open kharade-navin opened 4 years ago

kharade-navin commented 4 years ago

Hi All,

I am summarizing a DF which contains both numeric and categorical variables. Have faced below challenges while using group_by and summarize functions,

  1. While trying to measure the mode for a numerical variables using 'statistics.mode' function in group_by, it shows me below error. Code :- Temp2 = Df3 >> group_by(X.ID) >> summarize(AirTemp = statistics.mode(X.Weather_detailsAirtemperature)) Error :- TypeError: iter() returned non-iterator of type 'Intention'

  2. Summarizing categorical variable:- With unique in dplyr from R, I am able to summarize categorical variable however its not the same case in dfply - python, even distinct doesn't worked.

Code:- Test = Df3 >> group_by(X.ID) >> summarize(Precipitation = distinct(X.Precipitation))

Sample data

ID Date_time Weather_detailsAir_temperature Precipitation Precipitation_intensity Relative_humidity Wind_direction Wind_speed_in_m/s Day_time
DR_10002 19-12-2012 09:30 3 clear None 67 180 7 daylight
DR_10002 19-12-2012 09:30 3 clear None 67 180 7 daylight
DR_10002 19-12-2012 09:31 3 clear None 67 180 7 daylight
DR_10002 19-12-2012 09:36 1 clear None 66 163 4 daylight
DR_10002 19-12-2012 09:39 1 clear None 66 163 4 daylight