Open LindseyOlsen opened 1 year ago
Not sure, what do you mean by sample specific library? Peptide quantities are only comparable if they are obtained using the same spectral library and either (i) they are obtained using the same DIA-NN analysis, which might be an analysis which just aggregates .quant files or (ii) special steps are taken - see docs on incremental analysis - but option (ii) is sightly detrimental to the analysis quality.
By sample specific library I mean an in silico library that is filtered down using the DIA data from just one sample. If possible, I would like to run each sample separately and then merge the quantification. I am trying to avoid needing to download all of the raw files on our server at the same time. Perhaps the best way would be to run DIANN for each sample using the in silico library and then to save the quant files. Then to filter the in silico library using only the quant files and then reanalyzing the quant files with the cohort specific library. Would this be possible?
I don't think it makes any sense to create such sample-specific libraries.
Indeed, you can run samples separately anyway, with absolutely any library. DIA-NN produces a .quant file from each sample, and then you just need to aggregate those .quant files in a single experiment - but this is quick.
The suggested algorithm:
Ok thank you. I just want to make sure I understand the commands we would use to execute this pipeline.
First, we would get the quant files for each raw file using the in silico predicted library (Example of command we would run for each file) diann.exe --f "$file" --lib ${LOCAL_DIR}/insilico.predicted.speclib --threads 23 --min-pr-charge 2 --max-pr-charge 4 --mass-acc-ms1 40 --mass-acc 40 --pg-level 1 --window 9 --verbose 3 --out ${LOCAL_DIR}/step1.tsv --qvalue 0.01 --temp ${LOCAL_DIR} --min-fr-mz 100 --max-fr-mz 2000 --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1250 --unimod4 --smart-profiling --peak-center --int-removal 1
Then, we would use the quant files to generate the cohort specific library. diann.exe \ --lib ${LOCAL_DIR}/gencodev42.predicted.speclib \ --threads 18 --verbose 3 --window 9 --mass-acc-ms1 40 --pg-level 1 --mass-acc 40 --min-pr-charge 2 --max-pr-charge 4 --out ${LOCAL_DIR}/step2-out.tsv --qvalue 0.01 --temp ${LOCAL_DIR} --gen-spec-lib --out-lib ${LOCAL_DIR}/cohort_specific_lib.tsv --predictor --min-fr-mz 100 --max-fr-mz 2000 --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1250 --unimod4 --smart-profiling --int-removal 1 --peak-center --use-quant
However, I am not sure how to combine the .quant files on a single machine. The --dir flag is only for raw data and a command such as the one below doesn't load any files. diann.exe --lib ${LOCAL_DIR}/cohort_specific_lib.speclib --threads 92 --verbose 3 --report-lib-info --out ${LOCAL_DIR}/step3-out.tsv --qvalue 0.01 --pg-level 1 --mass-acc-ms1 40 --mass-acc 40 --window 9 --int-removal 1 --matrices --temp ${LOCAL_DIR} --smart-profiling --peak-center —use-quant
Is there a flag in addition to --use-quant that I need to add in order to combine all of the .quant files?
My primary interest is to use the peptide quantification from DIANN for downstream analysis. Would it be better to use a cohort specific library using --gen-spec-lib or a sample specific library? Using a sample specific library would allow us to process each sample individually and reduce the amount of disk space needed whereas, the cohort specific library requires all of the raw data to be downloaded processed together. Are peptide abundance using sample specific library comparable across samples?