lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

[FEATURE] Export matched fragment ions for rescoring & spectral library generation #101

Closed grosenberger-bruker closed 7 months ago

grosenberger-bruker commented 7 months ago

Hi @lazear,

For some downstream applications (spectral library generation or rescoring), algorithms require access to the scored spectra again. Most frequently, this is implemented by raw data access followed by repeat annotation of fragment ions using the PSMs. However, there is of course considerable overhead in this regard.

We thus think that it would be great if Sage natively had an option to directly export matched fragment ions based on the PSMs. This PR introduces an additional parameter that will export a parquet file containing all matched fragment ions for each PSM. Downstream applications like MS2Rescore or EasyPQP can then use the Sage PSM parquet and this new matched fragments parquet for rescoring or spectral library generation.

This is a draft PR, where we hope to receive feedback of any kind (style, implementation, algorithm, variable naming, etc.) to eventually make this feature as native as possible to Sage.

Thanks for your feedback!

vijay-gnanasambandan-bruker commented 7 months ago

Thank you for providing valuable feedback. I have made the necessary updates to the code based on your suggestions. Please review it again and let me know if there are any further adjustments that need to be addressed. Thank you.

lazear commented 7 months ago

Thanks for making changes - I will start reviewing and testing this week!

vijay-gnanasambandan-bruker commented 7 months ago

@lazear Thank you for approving the modifications.

lazear commented 7 months ago

Thank you for the excellent contribution!