Is it better to use a cohort specific library or sample specific library

LindseyOlsen commented 1 year ago

My primary interest is to use the peptide quantification from DIANN for downstream analysis. Would it be better to use a cohort specific library using --gen-spec-lib or a sample specific library? Using a sample specific library would allow us to process each sample individually and reduce the amount of disk space needed whereas, the cohort specific library requires all of the raw data to be downloaded processed together. Are peptide abundance using sample specific library comparable across samples?

vdemichev commented 1 year ago

Not sure, what do you mean by sample specific library? Peptide quantities are only comparable if they are obtained using the same spectral library and either (i) they are obtained using the same DIA-NN analysis, which might be an analysis which just aggregates .quant files or (ii) special steps are taken - see docs on incremental analysis - but option (ii) is sightly detrimental to the analysis quality.

LindseyOlsen commented 1 year ago

By sample specific library I mean an in silico library that is filtered down using the DIA data from just one sample. If possible, I would like to run each sample separately and then merge the quantification. I am trying to avoid needing to download all of the raw files on our server at the same time. Perhaps the best way would be to run DIANN for each sample using the in silico library and then to save the quant files. Then to filter the in silico library using only the quant files and then reanalyzing the quant files with the cohort specific library. Would this be possible?

vdemichev commented 1 year ago

I don't think it makes any sense to create such sample-specific libraries.

Indeed, you can run samples separately anyway, with absolutely any library. DIA-NN produces a .quant file from each sample, and then you just need to aggregate those .quant files in a single experiment - but this is quick.

The suggested algorithm:

Analyse with in silico library several hundred runs (can select randomly). Make a .tsv library out of those in DIA-NN. You can do it in one go or can just combine .quant files generated on different machines.
Analyse the whole experiment with this .tsv library. Again, can generate .quant files on separate machines if you want and then combine them on a single machine.

LindseyOlsen commented 1 year ago

Ok thank you. I just want to make sure I understand the commands we would use to execute this pipeline.

First, we would get the quant files for each raw file using the in silico predicted library (Example of command we would run for each file) diann.exe --f "$file" --lib ${LOCAL_DIR}/insilico.predicted.speclib --threads 23 --min-pr-charge 2 --max-pr-charge 4 --mass-acc-ms1 40 --mass-acc 40 --pg-level 1 --window 9 --verbose 3 --out ${LOCAL_DIR}/step1.tsv --qvalue 0.01 --temp ${LOCAL_DIR} --min-fr-mz 100 --max-fr-mz 2000 --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1250 --unimod4 --smart-profiling --peak-center --int-removal 1

Then, we would use the quant files to generate the cohort specific library. diann.exe \ --lib ${LOCAL_DIR}/gencodev42.predicted.speclib \ --threads 18 --verbose 3 --window 9 --mass-acc-ms1 40 --pg-level 1 --mass-acc 40 --min-pr-charge 2 --max-pr-charge 4 --out ${LOCAL_DIR}/step2-out.tsv --qvalue 0.01 --temp ${LOCAL_DIR} --gen-spec-lib --out-lib ${LOCAL_DIR}/cohort_specific_lib.tsv --predictor --min-fr-mz 100 --max-fr-mz 2000 --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1250 --unimod4 --smart-profiling --int-removal 1 --peak-center --use-quant

However, I am not sure how to combine the .quant files on a single machine. The --dir flag is only for raw data and a command such as the one below doesn't load any files. diann.exe --lib ${LOCAL_DIR}/cohort_specific_lib.speclib --threads 92 --verbose 3 --report-lib-info --out ${LOCAL_DIR}/step3-out.tsv --qvalue 0.01 --pg-level 1 --mass-acc-ms1 40 --mass-acc 40 --window 9 --int-removal 1 --matrices --temp ${LOCAL_DIR} --smart-profiling --peak-center —use-quant

Is there a flag in addition to --use-quant that I need to add in order to combine all of the .quant files?

vdemichev / DiaNN

Is it better to use a cohort specific library or sample specific library #783