molgenis / capice

GNU Lesser General Public License v3.0
22 stars 10 forks source link

Ensure columns that are removed for training do not depend on the imputing json file #121

Open svandenhoek opened 2 years ago

svandenhoek commented 2 years ago

Is your feature request related to a problem? Please describe. Currently it does not cause an issue. However, in case CAPICE would deprecate imputing, training would fail due to it using the imputing file to exclude certain fields. Additionally, if one would include such a field within imputing json (.f.e. gene_name), this would cause such a field to be used for training as well.

Describe the solution you'd like Ensure certain fields are excluded by default instead of depending on the imputing json to be excluded.

Describe alternatives you've considered Leaving it as it is. It is a low-priority issue so if deemed not needed, it can be put to won't fix.

svandenhoek commented 2 years ago

Perhaps investigate current implementation of the created exclusion list: https://github.com/molgenis/capice/blob/master/src/molgenis/capice/main_capice.py#L38 (capice generic) https://github.com/molgenis/capice/blob/master/src/molgenis/capice/main_train.py#L35 (train-specific) https://github.com/molgenis/capice/blob/master/src/molgenis/capice/utilities/preprocessor.py#L29 (capice generic that is not overriden through Main subclasses)