Sometimes we might want to apply different curation workflows to different subsets of the dataset, based on the value of label or non-label column (e.g. protein targets class)
Right now you'd need to run a different curation steps for each of the groups manually. I think there should be a way to group them instead with a GroupBy curation workflow, that leverages a new parameter in the DataIO step that lets you define a "group" column. This would need adjusting in both the Loader and Dataset classes to allow it.
Then you can specify a CurationWorkflow with a "groupby=True" call, passing a dict of steps lists for each group. Like
Sometimes we might want to apply different curation workflows to different subsets of the dataset, based on the value of label or non-label column (e.g. protein targets class)
Right now you'd need to run a different curation steps for each of the groups manually. I think there should be a way to group them instead with a GroupBy curation workflow, that leverages a new parameter in the DataIO step that lets you define a "group" column. This would need adjusting in both the Loader and Dataset classes to allow it.
Then you can specify a CurationWorkflow with a "groupby=True" call, passing a dict of steps lists for each group. Like