Open sashahafner opened 9 months ago
This answer was quite helpful https://stackoverflow.com/questions/34099684/how-to-use-groupby-transform-across-multiple-columns/74555697#74555697
Others that suggest splitting operation are missing the point.
Note that I have some examples worked out in https://github.com/AU-BCE-EE/OAC-course-private
Few tips:
reset_index()
at end can help in getting reasonable columnsairw['rem_eff'] = 100 * (1 - airw['mass_tot']['Out'] / airw['mass_tot']['In'])
print(airw.keys)
keys helps a little, only a little
One issue is controlling the name of a new column.
Here is an example where new column is named 0
and isn't clear to me how it can be simply set.
tot = pd.DataFrame(dat.groupby(['reactor', 'gas', 'temp']).apply(lambda x: mintegrate(x.day, x.qch4, value = 'total'))).reset_index()
tot.rename({0:'ech4'}, axis = 'columns', inplace = True)
tot
I see there is an assign
method/function that adds columns to data frames and is quite helpful for grouped operations.
See this example from Anna's work:
dat = dat.groupby('tank').apply(lambda x: x.assign(emis = si.cumulative_trapezoid(x['flux'], x['time'], initial = 0)))
And some info online:
See this solution https://stackoverflow.com/questions/73309294/how-to-apply-scipy-integrate-cumulative-trapezoid-to-grouped-pandas-dataframe-wi
Presumably assign is the proper way to add columns to data frames, and this would work with mintegrate or any other function that returns an appropriate array as well
The Pandas module has this functionality, but it is strange. Here is an example that I spent a lot of time on.
So
fintegrate
is a function that expects to arguments, otherwise theapply
bit could be replaced with a method. The oddest bit is that thegroupby
method doesn't return a data frame. Why??? Instead a series or Series or both or whatever. And it is the indices in that output that causes problems when trying to add it back to a data frame. It is like the developers thought users would want to display the results in the console and not save them. Strange.The backslash bit was from a Stack Overflow answer without explanation. Seems to allow splitting lines at the dot operator.
When simple methods can be used I think it is simpler.
Here I have applied
mean
to two columns in a dataframe grouped by two other columnes.There is also an aggregate function or method in Pandas for this stuff.