wilhelm-lab / oktoberfest

Rescoring and spectral library generation pipeline for proteomics.
MIT License
33 stars 8 forks source link

Additional percolator/mokapot features for custom file input #233

Closed ChrisMcGann closed 1 month ago

ChrisMcGann commented 3 months ago

The current internal file format specification specifies ten column headings. I'd like the option to add additional columns that will be used in the percolator/mokapot rescoring. I normally use Comet as my initial search tool and the .pin file it outputs not just a main Xcorr but dCn, lnExpect, etc, that normally make a big difference in the post-processing. Thanks for your help, it was a pleasure meeting you out in Anaheim last week!

picciama commented 3 months ago

Hi Chris, I just came back from post-ASMS holiday and remember discussing this with you! My proposed solution is to simply specify a bunch of mandatory columns and then simply forward every additional column to the percolator pin file if one specifies this explicitely in the config file, i.e.:

"use_feature_cols": "all",

and support also a list of columns:

"use_feature_cols": "col_x,col_y,col_z",

with the default being

"use_feature_cols": "none",

to avoid changing the default behaviour we have right now.

ChrisMcGann commented 3 months ago

Sounds like a great solution, would be exactly what I need. Thanks for getting back to me!

picciama commented 2 months ago

Sorry, was closed by accident. This is close to being finished and will be part of the next major oktoberfest release so it may take a while still for it being released but it is kinda ready but untested on this branch: https://github.com/wilhelm-lab/oktoberfest/tree/feature/additional_columns.

picciama commented 1 month ago

I merged this into development now and the documentation is online. The new feature in the config file is called "add_feature_cols". We added some basic unit tests but I think it may be a bit cumbersome if values are read that are not automatically converted to float. So please ensure you only add columns that are float and contain no NaN values, otherwise percolator won't like you very much. See the documentation for the current development branch here: https://oktoberfest.readthedocs.io/en/latest/jobs.html#c-rescoring and the explanation on how to use this config option here: https://oktoberfest.readthedocs.io/en/latest/config.html#applicable-to-rescoring

picciama commented 1 month ago

I also added this info to the internal format specification now: https://oktoberfest.readthedocs.io/en/latest/internal_format.html