Open mahsheed opened 3 years ago
Does using df.groupby('id').agg({'num': 'mode'})
achieve your desired result? The TaskHistogram
is in some legacy code I'm not familiar with, but the groupby/agg method may do the trick.
Hi @kmcentush,
Running that works with 'num': 'mean'
but it does not work with 'num': 'mode'
.
df.groupby('id').agg({'num': 'mode'})
produces the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-410-cef13c7bbe73> in <module>
4 df = vaex.from_pandas(df)
5 #df.mode(expression="num", binby=['id'])
----> 6 df.groupby('id').agg({'num': 'mode'})
~/workspace/facets-venv/lib64/python3.7/site-packages/vaex/groupby.py in agg(self, actions)
434 # TODO: this basically forms a cartesian product, we can do better, use a
435 # 'multistage' hashmap
--> 436 arrays = super(GroupBy, self)._agg(actions)
437 # we don't want non-existing pairs (e.g. Amsterdam in France does not exist)
438 counts = self.counts
~/workspace/facets-venv/lib64/python3.7/site-packages/vaex/groupby.py in _agg(self, actions)
338 else:
339 if isinstance(aggregate, six.string_types):
--> 340 aggregate = vaex.agg.aggregates[aggregate]
341 if callable(aggregate):
342 if name is None:
KeyError: 'mode'
Hi @mahsheed. I'll dig into this more tomorrow. Definitely seems like a bug based on your stacktraces!
Looks like agg
doesn't support mode yet. I'm digging into the df.mode()
call, and it looks like it's the only legacy task in Vaex that doesn't have the proper helper methods used by all of the other tasks.
@maartenbreddels @JovanVeljanoski, is the ideal fix to make an updated TaskHistogram
that is supported by the delayed executor? Or is a better solution to build something out for agg
and then have the dataframe just call that and group by the binby
arg?
Description I was not able to get the mode() feature to work, and I could not find examples of it being used. Does anyone know what might be the issue?
One approach I tried was the following:
Software information
Additional information Here is the error message I am getting: