[FEA] Support more dtypes in JIT GroupBy `apply`

rapidsai / cudf

cuDF - GPU DataFrame Library

https://docs.rapids.ai/api/cudf/stable/

Apache License 2.0

8.47k stars 907 forks source link

[FEA] Support more dtypes in JIT GroupBy `apply` #12608

Open brandon-b-miller opened 1 year ago

brandon-b-miller commented 1 year ago

Is your feature request related to a problem? Please describe. When https://github.com/rapidsai/cudf/pull/11452 lands, we'll get JIT Groupby.apply for a subset of UDFs and importantly, dtypes. However over the summer we only got as far as writing overloads for float64 and int64 dtypes in the users source data. It'd be nice if we could support more dtypes, starting at least with the rest of the numeric types.

Describe the solution you'd like Extend the existing groupby.apply, engine='jit' framework to support the following additional dtypes:

float32
int32
int16
int8
uint64
uint32
uint16
uint8
bool

A lot of the machinery in the original PR is fairly general and should make adding many of these easy- however there will undoubtedly be edge cases. As such it makes for a pretty good first issue for anyone jumping into the numba extension piece of the codebase.

dlee992 commented 1 year ago

Hi, brandon. I am interested in this part in cuDF, so I can give this a try in a few weeks.

brandon-b-miller commented 1 year ago

Hi @dlee992 , thanks so much for your interest in taking this on! I thought I'd share a little about how I'd approach this just to get you off the ground.

The main tests I recommend trying to pass first would be these ones which test simple reductions. This test is parameterized based on a list of supported types we hardcode in groupby_typing. I tried adding int32 locally and it looks like a lot works already when running the test:

error   : Undefined reference to 'BlockMin_int32' in '<cudapy-ptx>'

This hopefully means that adding a couple of lines around here in the c++ library could pass a bunch of tests right off the bat if we're lucky :)

Happy to help at any time on the RAPIDS goAI Slack by the way, just ping me with any questions. Good luck!

dlee992 commented 1 year ago

Thanks for your kindful comments! I have a little experience in Numba code, but I'm a newbie in cuDF and GPU testing. Anyway, since this is an interesting issue I think, I will give it a try as my side project.

In fact, I am now trying to get some GPU devices for local testing usage.

======

Updates: Got several GPU cards in an internal cluster.

vyasr commented 6 months ago

@brandon-b-miller is this issue still valid? I see that this works now, for instance:

In [96]: cudf.DataFrame({
    ...:     "a": cudf.Series([1, 1, 2, 2, 3, 3], dtype="float32"),
    ...:     "b": cudf.Series([2, 4, 0, 12, 30, 20], dtype="float32"),
    ...: }).groupby("a").apply(
    ...:     lambda gp: gp['b'].sum(),
    ...:     engine="jit",
    ...: )
Out[96]:
a
1.0     6.0
2.0    12.0
3.0    50.0
dtype: float32

I'm going to close as resolved for the moment, but please reopen if I missed something.

Note that I'm only concerned with the numeric types mentioned above; I don't think this issue is the right place to track e.g. string or struct UDFs.

brandon-b-miller commented 6 months ago

I believe we only have definitions for 4 and 8 byte int and float types for now. It'd be useful if we had implementations for int8, int16 types, etc. But IIRC the problem was missing features on the cuda side. It might have been a lack of atomics for those widths.

https://github.com/rapidsai/cudf/blob/branch-24.06/python/cudf/udf_cpp/shim.cu#L652

vyasr commented 6 months ago

Ah OK. Perhaps we should keep this issue open for the moment, then.