In [17]: %time result = df.groupby('group').apply(lambda g: len(g))
CPU times: user 6.45 s, sys: 68 ms, total: 6.52 s
Wall time: 6.51 s
The per-group overhead is fairly fixed -- with 5 million groups we have:
In [22]: %time result = df.groupby('group').apply(lambda g: len(g))
CPU times: user 31 s, sys: 108 ms, total: 31.1 s
Wall time: 31.1 s
It would be interesting to see if, but pushing down the disassembly-reassembly of DataFrame objects into C++ whether we can take the overhead from the current ~6 microseconds to under a microsecond or even less.
Note that the effects of bad memory locality are also a factor. We could look into tricks like using a background thread which "prepares" groups (up to a certain size / buffer threshold) while user apply functions are executing, to at least mitigate the time aspect of the groupby evaluation.
Consider the case of a DataFrame with a large number of distinct groups:
I have
The per-group overhead is fairly fixed -- with 5 million groups we have:
It would be interesting to see if, but pushing down the disassembly-reassembly of DataFrame objects into C++ whether we can take the overhead from the current ~6 microseconds to under a microsecond or even less.
Note that the effects of bad memory locality are also a factor. We could look into tricks like using a background thread which "prepares" groups (up to a certain size / buffer threshold) while user apply functions are executing, to at least mitigate the time aspect of the groupby evaluation.