lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
210 stars 39 forks source link

Global retention time alignment #52

Closed lazear closed 1 year ago

lazear commented 1 year ago

This PR introduces improvements to retention time prediction, and increase stability of the linear equations solver

RT Retention times are now globally aligned across files:

RT prediction is then performed on all files at once (on aligned RTs), rather than one file at a time - previously, there were many instances where some files in a search could not have RTs predicted, decreasing the effectiveness of delta_rt as a feature for LDA.

Sequence Deduplication Peptide sequences within a protein are now deduplicated - previously, repeated peptides would be called multiple times for the same protein (e.g. num_proteins > 1 even if the peptide was unique)

Gaussian solver

lazear commented 1 year ago

Passes idempotency/determinism check for b1906 file with:

Some non-deterministic behavior on larger sets, seems to be arising from RT alignment

lazear commented 1 year ago

Using SeqCst atomic ordering restores deterministic behavior on RT alignment, final results for searches with multiple runs