kalininalab / DataSAIL

DataSAIL is a tool to split datasets while reducing information leakage.
https://datasail.readthedocs.io
MIT License
18 stars 1 forks source link

Multiple Classes at Stratification #28

Closed Old-Shatterhand closed 3 months ago

Old-Shatterhand commented 4 months ago

Is your feature request related to a problem? Please describe. Kinda both. Currently, each sample can only be member of one class for the stratification, but if we want to balance multiple classes across the splits this is not possible.

Describe the solution you'd like Open the stratification argument to take a dictionary mapping strings to a list of classes and adjust the problem formulation accordingly. From the theoretical perspective this may not even require a change.

Describe alternatives you've considered A possible workaround may be to collate all class labels into one class and to treat this as a new class to be balanced, but this leads to a different problem, i.e., it has a different solution than the one original problem.

Additional context N.A.