nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
310 stars 119 forks source link

Add group by column stage #1699

Closed dagardner-nv closed 1 month ago

dagardner-nv commented 1 month ago

Description

By Submitting this PR I confirm:

dagardner-nv commented 1 month ago

Is there an issue# for this change? I would like to understand why it is being done.

The purpose of this stage is to ensure that the rest of the downstream stages in the pipeline get a unique message for each unique value in the column being grouped-by.

The idea is that this would improve performance. In addition to this, if we know that each value in column 'A' has the same value, we can drop the column and store that value in the ControlMessage.metadata field.

I opened issue #1709 for this, but unfortunately I didn't create an issue prior to working on this feature.

AnuradhaKaruppiah commented 1 month ago

LGTM, some minor suggestions.

AnuradhaKaruppiah commented 1 month ago

/merge

dagardner-nv commented 1 month ago

test

AnuradhaKaruppiah commented 1 month ago

/merge