This is the combination of many PRs, since I did a lot of refactoring of earlier changes that were not yet merged with main, I think it makes sense to review them all at once instead of separate PRs.
After this PR there are still two PRs that could be separated more clearly namely: #157 #156
The main goal of this pipeline was to create an automatic pipeline that:
Splits positive/negative mode
Splits test/train/val sets
Trains a model
Validate the model
In addition many smaller changes were made of things that were not intuitive to me.
Changes in #148
[x] Make pipeline that splits and stores positive and negative mode
[x] Make pipeline that splits the pos and neg into val, train, test sets
[x] Fix issue with assert difference == 0 in bias. (the training still crashes)
[x] Improve docstring
Changes in #149
[x] Add a SettingsMS2Deepscore class storing the settings
Changes in #152
[x] Refactoring of the automatic benchmarking pipeline.
[x] Create tanimoto scores on the spot (since they are not created during training anymore)
Since then:
[x] Refactor the training_wrapper_function.
[ ] Make a class out of the GeneratorSettings Done in #157
[ ] Integrate DataGeneratorCherryPicked with DataGeneratorBase. To do this the DataGeneratorBase should also use the SelectedCompoundPairs.
This is the combination of many PRs, since I did a lot of refactoring of earlier changes that were not yet merged with main, I think it makes sense to review them all at once instead of separate PRs.
After this PR there are still two PRs that could be separated more clearly namely: #157 #156
The main goal of this pipeline was to create an automatic pipeline that:
Changes in #148
Still to do: