rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.47k stars 907 forks source link

[FEA] Support if/else in AST #11163

Open revans2 opened 2 years ago

revans2 commented 2 years ago

Is your feature request related to a problem? Please describe. As a part of https://github.com/rapidsai/cudf/issues/11162 we really could use the ability to do if/else statements in AST.

Describe the solution you'd like Just like I said. It would be great to have a way to do if/else statements in AST. Takes a boolean value, a true expression and a false expression as input. If the boolean predicate is true then the true expression is evaluated and returned. If boolean predicate is false, then the false expression is evaluated and returned. For performance/consistency reasons I am fine if we evaluate both paths and just pick the one needed at the end.

Describe alternatives you've considered We really have no good way to simulate this with existing operators, except in some very rare corner cases.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

GregoryKimball commented 2 years ago

Thanks @revans2 for sharing this issue. Do you think there is enough value in expanding our AST implementation to warrant a repo milestone?

revans2 commented 2 years ago

I don't know what you mean by a repo milestone? Is this just for tracking purposes? If so @sameerz what is the high priority on this?

sameerz commented 2 years ago

I think what @GregoryKimball is referring to as a milestone is effectively an epic of work for AST.

In terms of priority, AST helps the the RAPIDS Spark plugin and is necessary for some queries. It is worth tracking as a milestone.

GregoryKimball commented 1 year ago

Hello @jlowe yesterday you mentioned a few uses that the Spark-RAPIDS plugin would have for the ternary operator in libcudf ASTs. Would you please list some of those uses and link to open Spark-RAPIDS issues on this topic? (FYI @karthikeyann)

jlowe commented 1 year ago

One key use case is described in #11162 where there's a higher order function used to perform the operation and we need the ability to translate that. AST isn't the only way to solve that particular issue, but it's a more general solution. See NVIDIA/spark-rapids#5227 for more context.

Another case would be optimizing complex projections. Take one-hot encodings, for example, where currently queries are doing a potentially large series of comparisons in a CASE WHEN that today turns into a long chain of copy_if_else operations. Using a chain of IF/ELSE in AST instead would avoid the manifestation of intermediates.