Benchmarking datasets - Githubissues

lgatto commented 1 year ago

On the sage web page, you mention several experiments that were used for various benchmarks:

PXD016766 for TMT search performance
PXD001468 to benchmark open search performance and chimeric search
PXD020815 for TMT quantification

I think it would be really helpful to provide the parameters (and possibly even the outputs) or these benchmarks.

The reason I am asking this is because I wanted to create a test dataset for some of my sage-based testing and development and I wanted to re-use some of the above. Instead finding the right search parameters and re-running things, at the risk of introducing mistakes or diverging from your results, it would be beneficial to be able reproduce exactly what you did to harmonise outputs and get more consistent results.

lazear commented 1 year ago

Many of the benchmarking parameters (outdated by now) can be found here: https://github.com/lazear/sage/tree/master/figures/benchmark_params

I am actively working on getting a better benchmarking harness set up (to test LFQ performance as well) to prepare for eventual publication (and ideally continuous regression testing against a few datasets), so stay tuned.

lgatto commented 1 year ago

Fantastic, thank you very much. Great work!

guoci commented 8 months ago

@lazear Do you have the benchmarking files you used in the publication available for download?

lazear commented 8 months ago

All of the PXD accessions are listed in the publication

guoci commented 8 months ago

@lazear Are the parameter files in https://github.com/lazear/sage/tree/master/figures/benchmark_params up to date?

lazear commented 8 months ago

No, as mentioned above they are out of date (from the original blog post announcement) - and should likely be deleted.

You can find an up-to-date version of parameters and their descriptions in the documentation, e.g. https://sage-docs.vercel.app/docs/configuration/example_PXD003881

lazear / sage

Benchmarking datasets #59