Mokapot is a workflow that broadly consists of the following steps
data preprocessing: optionally subsetting the input data and then doing a 3-fold split to tho generate training data
model training: using the semi-supervised Percolator algorithm to train a classification model on the training data
confidence assignment: the models scores are used to calculate q-values and assign confidence values to the input
As a user it is convenient to have the workflow in a single CLI that executes all steps end-to-end.
As a developer it would be convenient to have the option to run the three steps separately.
This would allow for easier separation in integration tests and easier evaluate ideas such as: How well does this pretrained model perform for a different dataset?
Mokapot is a workflow that broadly consists of the following steps
As a user it is convenient to have the workflow in a single CLI that executes all steps end-to-end. As a developer it would be convenient to have the option to run the three steps separately. This would allow for easier separation in integration tests and easier evaluate ideas such as: How well does this pretrained model perform for a different dataset?
One way would be to introduce sub commands:
mokapot preprocess /input/data.x /preprocessed/training/data [--max_subset]
mokapot train /preprocessed/training/data /trained/model [--max_iterations]
mokapot confidence /trained/model /output/results.x
Tasks
mokapot.brew
into the three stepsmokapot.brew
to add them as separate sub-commands, respectively