Closed dagardner-nv closed 1 month ago
Is there an issue# for this change? I would like to understand why it is being done.
The purpose of this stage is to ensure that the rest of the downstream stages in the pipeline get a unique message for each unique value in the column being grouped-by.
The idea is that this would improve performance. In addition to this, if we know that each value in column 'A' has the same value, we can drop the column and store that value in the ControlMessage.metadata
field.
I opened issue #1709 for this, but unfortunately I didn't create an issue prior to working on this feature.
LGTM, some minor suggestions.
/merge
test
/merge
Description
GroupByColumnStage
Closes #1709By Submitting this PR I confirm: