mathysgrapotte / stimulus

Stochastic Testing and Input Manipulation for Unbiased Learning Systems
12 stars 5 forks source link

downsampling block in stimulus #84

Open alessiovignoli opened 2 months ago

alessiovignoli commented 2 months ago

add a function that donsample the data given.

To be decided if the downsampling happens on a given set (based on the split valie or column ecc..) or if it is always applied to the data given and that is it.

It needs to be figured out hyrairchy in the pipeline.

alessiovignoli commented 2 months ago

Pipeline should connect the following processes logically in this hierarchy:

split -> downsample -> data augment -> noise

so the json in nextflow that will be passed through the above pipe can have some keywards set to none. The workflow will go into split, then into downsample but that is none in the json and proceed to the data augment but that is none then go to noise and execute according to the arguments of noise.

This behaviour is good for two reasons:

  1. The interpretation of the user Json in the case of summarized instructions (non-custom) is still simple. because it will only create possibilities that have either split and noise, split and data augment, split and downsample. So the effect of each of those behaviours will be in an isolated manner.

  2. through t the custom key in the user Json the user can write himself special cases in which something like this happens: split + downsample + noise, or all of them at once, or all the 3/4 combinations. That would othrwise m ake almost impossible to write an automatic generation of combinations (too many).