vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Premature exit DIA-NN Astral #1076

Closed ds2268 closed 4 months ago

ds2268 commented 4 months ago

DIA-NN run on Astral data (dilution series, 18 files) exited after 1st pass (after loading spec lib). No errors were given and no log was written. The system had 120GB of memory and 32 threads.

report-100pct-first-pass.tsv was 5.8GB and report-lib-100pct.tsv was 13GB

image image

Another run exited after the 2nd pass, at quantification (yeast KO, 312 files). The report-100pct-first-pass.tsv was 34GB. report-lib-100pct.tsv was 6.4GB

image

Any idea what is going on? All the RAW Astral files were converted to mzML. One other Astral run succeeded without problems.

vdemichev commented 4 months ago

Thank you for letting me know. Now, one thing odd here is the size of the empirical spectral library, which is caused by the FDR setting set to 100%. As you can see in the log, DIA-NN prints a warning indicating that this is strongly not recommended. This leads to DIA-NN saving everything in the spectral library, as opposed to only 'good peptides'. We have in fact validated DIA-NN in MBR mode with qvalue settings up to 50%, but there are no known scenarious for which it would be beneficial to keep it above 5%. With 100% it further becomes signficantly worse than 50%, so definitely a bad idea. Yes, it is possible that it ran out of RAM, not sure what else could cause it.

Were the raw files converted to .mzML using MSConvert GUI and the recommended settings https://github.com/vdemichev/DiaNN?tab=readme-ov-file#raw-data-formats? In general, DIA-NN should work fine with .raw files directly, no conversion necessary.

ds2268 commented 4 months ago

This particular Astral file did not work from .raw so needed to convert it. It could indeed be a memory issue, as I was running 2 runs on the same VM and saw that memory consumption increased in some steps. Re-running it now with just 1 job.

Yes, I saw that the number of precursors does not increase drastically by going from 1%, 20% FDR to 100% FDR for example (500k, 700k, 900k). The majority of precursor IDs are in 1% FDR, which was surprising to me.

Is there any other downside of running at > 5% FDR and filtering it down with Q.Value so that there is no need to run DIA-NN x-times to get results at different q-value thresholds?

vdemichev commented 4 months ago

Yes, migth be that error that got reported in another thread, it will be fixed in 1.9.1 to be released soon.

"Is there any other downside of running at > 5% FDR" - just worse results - until 50% FDR (so long as you filter the output afterwards). If you set to 100%, I am not sure if the output will be reliable - it's never been tested. In the update, I will make DIA-NN always filter the library at least at 50%. Importantly: with MBR, running at X% FDR and filtering at Y% FDR is not the same as running at Y% FDR.