lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Benchmarking datasets #59

Closed lgatto closed 1 year ago

lgatto commented 1 year ago

On the sage web page, you mention several experiments that were used for various benchmarks:

I think it would be really helpful to provide the parameters (and possibly even the outputs) or these benchmarks.

The reason I am asking this is because I wanted to create a test dataset for some of my sage-based testing and development and I wanted to re-use some of the above. Instead finding the right search parameters and re-running things, at the risk of introducing mistakes or diverging from your results, it would be beneficial to be able reproduce exactly what you did to harmonise outputs and get more consistent results.

lazear commented 1 year ago

Many of the benchmarking parameters (outdated by now) can be found here: https://github.com/lazear/sage/tree/master/figures/benchmark_params

I am actively working on getting a better benchmarking harness set up (to test LFQ performance as well) to prepare for eventual publication (and ideally continuous regression testing against a few datasets), so stay tuned.

lgatto commented 1 year ago

Fantastic, thank you very much. Great work!

guoci commented 8 months ago

@lazear Do you have the benchmarking files you used in the publication available for download?

lazear commented 8 months ago

All of the PXD accessions are listed in the publication

guoci commented 8 months ago

@lazear Are the parameter files in https://github.com/lazear/sage/tree/master/figures/benchmark_params up to date?

lazear commented 8 months ago

No, as mentioned above they are out of date (from the original blog post announcement) - and should likely be deleted.

You can find an up-to-date version of parameters and their descriptions in the documentation, e.g. https://sage-docs.vercel.app/docs/configuration/example_PXD003881