[Pitch] Lazy data loading

Current limitation: Right now mokapot reads all the .pin file to memory and then uses a subset to train the model, which later is used to score the data.

Suggestion: Do a first pass to check what 'subset ratio' is needed to read from the data, store only that into mem and train the model.

Expected complications: Adding the peptide and protein level confidences (as well as calculating the q-values) requires the whole data to be loaded to memory at once.

Possible solutions: ... polars ...

wfondrie / mokapot

[Pitch] Lazy data loading #92