Current limitation: Right now mokapot reads all the .pin file to memory and then uses a subset to train the model, which later is used to score the data.
Suggestion: Do a first pass to check what 'subset ratio' is needed to read from the data, store only that into mem and train the model.
Expected complications: Adding the peptide and protein level confidences (as well as calculating the q-values) requires the whole data to be loaded to memory at once.
Current limitation: Right now mokapot reads all the .pin file to memory and then uses a subset to train the model, which later is used to score the data.
Suggestion: Do a first pass to check what 'subset ratio' is needed to read from the data, store only that into mem and train the model.
Expected complications: Adding the peptide and protein level confidences (as well as calculating the q-values) requires the whole data to be loaded to memory at once.
Possible solutions: ... polars ...