vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
241 stars 50 forks source link

DIANN CLI settings for peptidomics #1085

Open ako81818 opened 2 weeks ago

ako81818 commented 2 weeks ago

Thank you for the amazing contribution of DIANN. I am picking up on the recent release of diaTracer through FragPipe and am working off of their included peptidomics workflow that builds a single library off of individual diaPASEF runs and passes that to DIANN quant. However, I'm wondering if the DIANN params being passed are quite ideal.

Being peptidomics, having often 1 peptide matches to a given protein, I'm particularly looking to turn off any functionality that assumes peptide-protein relationships in filtering and reporting the data. In other words, I want to control FDR at the precursor level and not filter spectral matches with any reflection on how they relate to other peptides of the same protein. Reporting wise, the --no-prot-inf flag is turned on. Per FDR, the default workflow has --matrix-spec-q and --qvalue 0.05, aiming to control the precursor FDR to 5%, though from the log file, it isn't clear it is doing what I am looking to do.

Per file, the Number of IDs is being controlled at a 1% FDR (I can't seem to set this to 5%), which is followed by "calculating protein q-values" and reporting protein IDs at 1% FDR. At the end, when it writes the precursor matrix, it says that precursors are being filtered at a 1% FDR. I'm not seeing where the --qvalue 0.05 is being applied or how to change the precursor-level FDR adjustments directly. Greatly appreciate any recommendations on flags to use to control at the precursor level (and not filter at the protein level) if possible.

Regards, Andrew

vdemichev commented 2 weeks ago

Hi Andrew,

In DIA-NN protein information never affects precursor-level q-values, so no issue here.

Per file, the Number of IDs is being controlled at a 1% FDR (I can't seem to set this to 5%)

That's just output in the log. Please use the main .parquet report, it will have the q-value thresholds you set in the settings.

Best, Vadim

ako81818 commented 2 weeks ago

Thank you Vadim. To clarify, if using the --qvalue 0.05, will the pr-matrix output be filtered to 5% fdr, or do I need to set --matrix-qvalue 0.05 too? Or will the matrix outputs always be set to 0.01?

Sorry, but how do I generate the .parquet report? I'm getting the report.tsv, stats.tsv, and matrix reports. Looking over the CLI reference, I'm not seeing a flag that would turn off / on that report (--no-main-report seems to turn off all the .tsv's) but it isn't generating a file with the .parquet extension.

Best, Andrew

vdemichev commented 2 weeks ago

For pr_matrix please use --matrix-qvalue. The .parquet report is always generated automatically by DIA-NN 1.9 (not the previous versions) along with the main .tsv report, in the same folder.