wfondrie / mokapot

Fast and flexible semi-supervised learning for peptide detection in Python
https://mokapot.readthedocs.io
Apache License 2.0
40 stars 14 forks source link

[Pitch] Lazy data loading #92

Closed jspaezp closed 1 year ago

jspaezp commented 1 year ago

Current limitation: Right now mokapot reads all the .pin file to memory and then uses a subset to train the model, which later is used to score the data.

Suggestion: Do a first pass to check what 'subset ratio' is needed to read from the data, store only that into mem and train the model.

Expected complications: Adding the peptide and protein level confidences (as well as calculating the q-values) requires the whole data to be loaded to memory at once.

Possible solutions: ... polars ...

wfondrie commented 1 year ago

Indeed #89 would solve this 🤔