vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
281 stars 53 forks source link

no results written #1200

Closed animesh closed 2 weeks ago

animesh commented 1 month ago

I tried running the latest version over some timsTOF-pro data containing phospho-enriched peptides

diann.exe --f "F:\promec\TIMSTOF\LARS\2024\241002_zrimac\zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d
" --f "F:\promec\TIMSTOF\LARS\2024\241002_zrimac\zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d
" --f "F:\promec\TIMSTOF\LARS\2024\241002_zrimac\zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d
" --lib "" --threads 6 --verbose 5 --out "F:\promec\TIMSTOF\LARS\2024\241002_zrimac\DIANN1p9p1\report.tsv" --qvalue 0.01 --matrices  --out-lib "F:\promec\TIMSTOF\LARS\2024\241002_zrimac\DIANN1p9p1\reportlib.parquet" --gen-spec-lib --predictor --fasta camprotR_240512_cRAP_20190401_full_tags.fasta --cont-quant-exclude cRAP- --fasta "F:\promec\FastaDB\uniprotkb_proteome_UP000005640_2024_04_18.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 2 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --var-mod UniMod:21,79.966331,STY --mass-acc 20 --mass-acc-ms1 20 --individual-mass-acc --individual-windows --peptidoforms --relaxed-prot-inf --rt-profiling 

which seems like finished processing witho ut errors phos.log.txt but i could not find the results in the folder it says it wrote the results to? Spec lib was generated and written though 🫡

vdemichev commented 1 month ago

Hi Ani,

DIA-NN ran out of RAM. There's a 1.9.2 release soon, will have ~2x lower memory consumption. For now, please see suggestions for phospho here, this should help: https://github.com/vdemichev/DiaNN?tab=readme-ov-file#ptms-and-peptidoforms.

Also, most importantly, please generate an in silico library in a separate pipeline step, i.e. without any raw files specified.

Best, Vadim

animesh commented 1 month ago

Thanks @vdemichev 💯 i can try to run on higher RAM machine but then it would be linux. Is 1.9.1 working well on Linux and any specific things to consider?

vdemichev commented 1 month ago

Hi Ani,

On Linux please don't use --matrices, otherwise all the same. But you can also just reduce RAM usage by using the recommended settings for phospho.

Best, Vadim

animesh commented 1 month ago

Dear Vadim,

i am trying to create library following the guidelines,

diann.exe --threads 32 --verbose 5 --out "F:\promec\FastaDB\uniprot-human-iso-jan24.report.tsv" --qvalue 0.01 --out-lib "F:\peomec\FastaDB\uniprot-human-iso-jan24.report-lib.parquet" --gen-spec-lib --predictor --fasta "F:\promec\FastaDB\uniprot-human-iso-jan24.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 2 --max-pr-charge 3 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --var-mod UniMod:21,79.966331,STY --peptidoforms --relaxed-prot-inf --rt-profiling
DIA-NN 1.9.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Jul 15 2024 15:40:36
Current date and time: Sat Oct 12 09:56:58 2024
CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
SIMD instructions: AVX SSE4.1 SSE4.2
Logical CPU cores: 32
Thread number set to 32
Output will be filtered at 0.01 FDR
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
DIA-NN will carry out FASTA digest for in silico lib generation
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 2
Max precursor charge set to 3
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Peptidoform scoring enabled
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
The following variable modifications will be scored: UniMod:35 UniMod:1 UniMod:21

0 files will be processed
[0:00] Loading FASTA F:\promec\FastaDB\uniprot-human-iso-jan24.fasta
[0:47] Processing FASTA
[6:13] Assembling elution groups
[9:07] 47672221 precursors generated
[9:07] Gene names missing for some isoforms
[9:07] Library contains 82078 proteins, and 20536 genes
...

does it sound right or i am missing some switches to get it right 🫡

BTW is there 1.9.2 already out there to be tested 🤓

vdemichev commented 1 month ago

Min precursor m/z set to 300 Max precursor m/z set to 1800

Is this the experiment range?

Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable

Increases search space.

animesh commented 3 weeks ago

Thanks @vdemichev for pointing that out, turns out to be 100 and 1700 respectively, so i recreated the phopsho-lib

diann.exe --lib "" --threads 32 --verbose 1 --out "F:\promec\FastaDB\phoslibMC1V3mz100to1700c2to3humanreport.tsv" --qvalue 0.01 --matrices  --out-lib "F:\promec\FastaDB\phoslibMC1V3mz100to1700c2to3human.parquet" --gen-spec-lib --predictor --fasta camprotR_240512_cRAP_20190401_full_tags.fasta --cont-quant-exclude cRAP- --fasta "F:\promec\FastaDB\UP000005640_9606.fasta" --fasta-search --min-fr-mz 100 --max-fr-mz 1700 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 100 --max-pr-mz 1700 --min-pr-charge 2 --max-pr-charge 3 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --var-mod UniMod:21,79.966331,STY --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling 
DIA-NN 1.9.2 (Data-Independent Acquisition by Neural Networks)
Compiled on Oct 17 2024 21:58:43
Current date and time: Tue Oct 22 12:26:08 2024
CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 
Logical CPU cores: 64
Thread number set to 32
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
Peptides corresponding to protein sequence IDs tagged with cRAP- will be excluded from normalisation as well as quantification of protein groups that do not include proteins bearing the tag
DIA-NN will carry out FASTA digest for in silico lib generation
Min fragment m/z set to 100
Max fragment m/z set to 1700
N-terminal methionine excision enabled
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 100
Max precursor m/z set to 1700
Min precursor charge set to 2
Max precursor charge set to 3
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Peptidoform scoring enabled
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
The following variable modifications will be scored: UniMod:35 UniMod:1 UniMod:21 
WARNING: MBR turned off, two or more raw files are required

0 files will be processed
[0:00] Loading FASTA camprotR_240512_cRAP_20190401_full_tags.fasta
[0:00] Loading FASTA F:\promec\FastaDB\UP000005640_9606.fasta
[0:19] Processing FASTA
[2:53] Assembling elution groups
[4:04] 36710394 precursors generated
[4:04] Gene names missing for some isoforms
[4:04] Library contains 20695 proteins, and 20458 genes
[5:07] Encoding peptides for spectra and RTs prediction
[6:59] Predicting spectra and IMs
[116:02] Predicting RTs
[136:19] Decoding predicted spectra and IMs
[137:05] Decoding RTs
[137:23] Saving the library to F:\promec\FastaDB\phoslibMC1V3mz100to1700c2to3human.predicted.speclib
[142:12] Initialising library

The following warnings or errors (in alphabetic order) were detected at least the indicated number of times:
WARNING: MBR turned off, two or more raw files are required : 1
Finished

How to cite:
using DIA-NN: Demichev et al, Nature Methods, 2020, https://www.nature.com/articles/s41592-019-0638-x
analysing Scanning SWATH: Messner et al, Nature Biotechnology, 2021, https://www.nature.com/articles/s41587-021-00860-4
analysing PTMs: Steger et al, Nature Communications, 2021, https://www.nature.com/articles/s41467-021-25454-1
analysing dia-PASEF: Demichev et al, Nature Communications, 2022, https://www.nature.com/articles/s41467-022-31492-0
analysing Slice-PASEF: Szyrwiel et al, biorxiv, 2022, https://doi.org/10.1101/2022.10.31.514544
plexDIA / multiplexed DIA: Derks et al, Nature Biotechnology, 2023, https://www.nature.com/articles/s41587-022-01389-w
CysQuant: Huang et al, Redox Biology, 2023, https://doi.org/10.1016/j.redox.2023.102908
using QuantUMS: Kistner at al, biorxiv, 2023, https://doi.org/10.1101/2023.06.20.545604

DIA-NN exited 

then had to move it to the linux with better RAM? and it seems like it worked

./diann-linux --threads 20 --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d" --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d" --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d" --lib "/cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclib"  --verbose 1 --out "/cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.tsv" --qvalue 0.01 --matrices  --min-corr 2.0 --corr-diff 1.0 --time-corr-only --extracted-ms1 --predictor --fasta camprotR_240512_cRAP_20190401_full_tags.fasta --cont-quant-exclude cRAP- --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --var-mod UniMod:21,79.966331,STY --mass-acc 20.0 --mass-acc-ms1 20.0 --peptidoforms
DIA-NN 1.9.2 (Data-Independent Acquisition by Neural Networks)
Compiled on Oct 20 2024 02:59:53
Current date and time: Sat Oct 26 09:58:54 2024
Logical CPU cores: 80
Thread number set to 20
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
Only peaks with correlation sum exceeding 2 will be considered
Peaks with correlation sum below 1 from maximum will not be considered
A single score will be used until RT alignment to save memory; this can potentially lead to slower search
Fast algorithm based on MS1 feature extraction for quicker library-free search will be applied; this significantly worsens the identification performanc
e
Deep learning will be used to generate a new in silico spectral library from peptides provided
Peptides corresponding to protein sequence IDs tagged with cRAP- will be excluded from normalisation as well as quantification of protein groups that do
 not include proteins bearing the tag
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Peptidoform scoring enabled
Mass accuracy will be fixed to 2e-05 (MS2) and 2e-05 (MS1)
The following variable modifications will be scored: UniMod:35 UniMod:1 UniMod:21
Unless the spectral library specified was created by this version of DIA-NN, it's strongly recommended to specify a FASTA database and use the 'Reannota
te' function to allow DIA-NN to identify peptides which can originate from the N/C terminus of the protein: otherwise site localisation might not work p
roperly for modifications of the protein N-terminus or for modifications which do not allow enzymatic cleavage after the modified residue

3 files will be processed
[0:00] Loading spectral library /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclib
[2:27] Library annotated with sequence database(s): camprotR_240512_cRAP_20190401_full_tags.fasta; F:\promec\FastaDB\UP000005640_9606.fasta
[2:54] Spectral library loaded: 20695 protein isoforms, 29501 protein groups and 36710394 precursors in 19244792 elution groups.
[2:54] Loading protein annotations from FASTA camprotR_240512_cRAP_20190401_full_tags.fasta
[2:55] Annotating library proteins with information from the FASTA database
[2:55] Gene names missing for some isoforms
[2:55] Library contains 20695 proteins, and 20458 genes
[4:58] Encoding peptides for spectra and RTs prediction
[6:29] Predicting spectra and IMs
[62:07] Predicting RTs
[70:31] Decoding predicted spectra and IMs
[71:17] Decoding RTs
[71:37] Saving the library to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport-lib.predicted.speclib
[76:09] Initialising library
WARNING: it is strongly recommended to enable MBR when analysing with a large library, if this is a quantitative analysis

[81:16] File #1/3
[81:16] Loading run /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d
WARNING: for most Slice/DIA-PASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to values in the range 10-15 ppm
[83:08] 32961870 library precursors are potentially detectable
[83:18] Calibrating with mass accuracies 20 (MS1), 20 (MS2)
[130:39] RT window set to 4.98428
[130:39] Ion mobility window set to 0.0572864
[130:39] Peak width: 9.48
[130:39] Scan window radius set to 20
[130:40] Recommended MS1 mass accuracy setting: 14.1767 ppm
[174:53] Removing low confidence identifications
[188:58] Precursors at 1% peptidoform FDR: 16188
[189:01] Removing interfering precursors
[189:36] Training neural networks on 45590 PSMs
[189:50] Number of IDs at 0.01 FDR: 20504
[189:53] Precursors at 1% peptidoform FDR: 18134
[190:04] Calculating protein q-values
[190:05] Number of genes identified at 1% FDR: 4219 (precursor-level), 4015 (protein-level) (inference performed using proteotypic peptides only)
[190:06] Quantification
[190:08] Precursors with monitored PTMs at 1% FDR: 3332 out of 17784 considered
[190:08] Unmodified precursors with monitored PTM sites at 1% FDR: 12901
[190:08] Precursors with PTMs localised (when required) with > 90% confidence: 2677 out of 3332
[190:13] Quantification information saved to /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d.quant

[190:13] File #2/3
[190:13] Loading run /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d
[191:56] 32961870 library precursors are potentially detectable
[192:06] Calibrating with mass accuracies 20 (MS1), 20 (MS2)
[246:43] RT window set to 6.17049
[246:43] Ion mobility window set to 0.0549881
[246:43] Recommended MS1 mass accuracy setting: 13.375 ppm
[304:35] Removing low confidence identifications
[323:54] Precursors at 1% peptidoform FDR: 21114
[323:58] Removing interfering precursors
[324:34] Training neural networks on 59320 PSMs
[324:48] Number of IDs at 0.01 FDR: 20957
[324:51] Precursors at 1% peptidoform FDR: 18289
[325:02] Calculating protein q-values
[325:04] Number of genes identified at 1% FDR: 4325 (precursor-level), 4064 (protein-level) (inference performed using proteotypic peptides only)
[325:04] Quantification
[325:06] Precursors with monitored PTMs at 1% FDR: 5725 out of 18333 considered
[325:06] Unmodified precursors with monitored PTM sites at 1% FDR: 10744
[325:06] Precursors with PTMs localised (when required) with > 90% confidence: 3919 out of 5725
[325:11] Quantification information saved to /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d.quant

[325:12] File #3/3
[325:12] Loading run /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d
[330:24] 32961870 library precursors are potentially detectable
[330:34] Calibrating with mass accuracies 20 (MS1), 20 (MS2)
[365:26] RT window set to 4.98172
[365:26] Ion mobility window set to 0.0572921
[365:27] Recommended MS1 mass accuracy setting: 12.8901 ppm
[401:57] Removing low confidence identifications
[413:30] Precursors at 1% peptidoform FDR: 14966
[413:32] Removing interfering precursors
[414:08] Training neural networks on 37928 PSMs
[414:22] Number of IDs at 0.01 FDR: 19274
[414:25] Precursors at 1% peptidoform FDR: 17318
[414:36] Calculating protein q-values
[414:38] Number of genes identified at 1% FDR: 4328 (precursor-level), 4169 (protein-level) (inference performed using proteotypic peptides only)
[414:38] Quantification
[414:40] Precursors with monitored PTMs at 1% FDR: 1709 out of 16880 considered
[414:40] Unmodified precursors with monitored PTM sites at 1% FDR: 13775
[414:40] Precursors with PTMs localised (when required) with > 90% confidence: 1436 out of 1709
[414:46] Quantification information saved to /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d.quant

[414:46] Cross-run analysis
[414:46] Reading quantification information: 3 files
[415:00] Quantifying peptides
WARNING: QuantUMS requires 6 or more runs for the optimisation of its hyperparameters to perform best.
[415:16] Quantification parameters: 0.385514, 0.00433721, 0.00417173, 0.0121374, 0.0121517, 0.0120126, 0.0750558, 0.1541, 0.118304, 0.0138328, 0.0179242, 0.0147071, 0.408266, 0.0681673, 0.11726, 0.0126748
[415:22] Assembling protein groups
[415:26] Quantifying proteins
[415:27] Calculating q-values for protein and gene groups
[415:30] Calculating global q-values for protein and gene groups
[415:31] Protein groups with global q-value <= 0.01: 5851
[415:33] Compressed report saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.parquet. Use R 'arrow' or Python 'PyArrow' package to process
[415:33] Writing report
[415:37] Report saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.tsv.
[415:37] Saving precursor levels matrix
[415:38] Precursor levels matrix (1% precursor and protein group FDR) saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.pr_matrix.tsv.
[415:38] Saving protein group levels matrix
[415:38] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3hum
an.predicted.speclibreport.pg_matrix.tsv.
[415:38] Saving gene group levels matrix
[415:38] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human
.predicted.speclibreport.gg_matrix.tsv.
[415:38] Saving unique genes levels matrix
[415:38] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.unique_genes_matrix.tsv.
[415:38] Manifest saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.manifest.txt
[415:38] Stats report saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.stats.tsv

The following warnings or errors (in alphabetic order) were detected at least the indicated number of times:
WARNING: QuantUMS requires 6 or more runs for the optimisation of its hyperparameters to perform best. : 1
WARNING: for most Slice/DIA-PASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to values in the range 10-15 ppm : 1
WARNING: it is strongly recommended to enable MBR when analysing with a large library, if this is a quantitative analysis : 1
Finished

How to cite:
using DIA-NN: Demichev et al, Nature Methods, 2020, https://www.nature.com/articles/s41592-019-0638-x
analysing Scanning SWATH: Messner et al, Nature Biotechnology, 2021, https://www.nature.com/articles/s41587-021-00860-4
analysing PTMs: Steger et al, Nature Communications, 2021, https://www.nature.com/articles/s41467-021-25454-1
analysing dia-PASEF: Demichev et al, Nature Communications, 2022, https://www.nature.com/articles/s41467-022-31492-0
analysing Slice-PASEF: Szyrwiel et al, biorxiv, 2022, https://doi.org/10.1101/2022.10.31.514544
plexDIA / multiplexed DIA: Derks et al, Nature Biotechnology, 2023, https://www.nature.com/articles/s41587-022-01389-w
CysQuant: Huang et al, Redox Biology, 2023, https://doi.org/10.1016/j.redox.2023.102908
using QuantUMS: Kistner at al, biorxiv, 2023, https://doi.org/10.1101/2023.06.20.545604
[415:39] Log saved to /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.log.txt

but looking at the phospho-tables, it is just full of cRAP?

cat /cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.phosphosites*.tsv
Protein Protein.Names   Gene.Names      Residue Site    Sequence        /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d  /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d  /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d
cRAP-P02662     cRAP-CASA1_BOVIN        cRAP-CSN1S1     S       130     QLEIVPNSAEERLHS 183863  426221  0
cRAP-P02666     cRAP-CASB_BOVIN cRAP-CSN2       S       50      KKIEKFQSEEQQQTE 97618.9 483020  0
cRAP-P12763     cRAP-FETUA_BOVIN        cRAP-AHSG       S       323     SGVASVESSSGEAFH 815582  0       0
cRAP-P12763     cRAP-FETUA_BOVIN        cRAP-AHSG       S       325     VASVESSSGEAFHVG 815582  0       0
Protein Protein.Names   Gene.Names      Residue Site    Sequence        /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d  /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d  /cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d
cRAP-P02662     cRAP-CASA1_BOVIN        cRAP-CSN1S1     S       130     QLEIVPNSAEERLHS 183863  426221  0
cRAP-P02666     cRAP-CASB_BOVIN cRAP-CSN2       S       50      KKIEKFQSEEQQQTE 97618.9 483020  0

I am getting about 2-3000 phosho-peptides quantified with MaxQuant from same samples run in DDA-mode

Intensity 100ug_zoom20_1dda_25pepsep_S1-A1_1_8445   2342
Intensity 200ug_zoom20_1dda_25pepsep_S1-A4_1_8446   1732
Intensity 300ug_zoom20_1dda_25pepsep_S1-A7_1_8447   3491

probably i am not using DIA-NN properly or DIA generation with enrichment from timsTOF-pro is itself flawed? Wondering if there a way to check that?

vdemichev commented 3 weeks ago

Hi Ani,

There's some mistake about the m/z range, it should be different for precursors and fragments. For the fragments you normally don't want to adjust it. For precursors - it's the MS/MS precursor range, i.e. what is covered by isolation windows.

but looking at the phospho-tables, it is just full of cRAP?

Which makes sense, common contaminants have phosphates too :) What is the number of non-cRAP?

I am getting about 2-3000 phosho-peptides quantified with MaxQuant from same samples run in DDA-mode

Non-contaminant?

or DIA generation with enrichment from timsTOF-pro is itself flawed?

Runs look fine. Degree of enrichment is clearly not the best (most detected peptides are unmodified), but the numbers of phosphorylated precursors are still OK. Note that you are not using MBR (need to enable, please see the respective warning).

Wondering if there a way to check that?

cRAP is a good control actually, if it pops up but the proteins you are looking for do not, this means they are just not in the sample (in phosphorylated form).

Best, Vadim

animesh commented 3 weeks ago

Thanks @vdemichev for blitz response! Well the m/z i got from the guy who run the instrument but i can of course double-check 💯

" What is the number of non-cRAP?" well none actually... nothing from the sample?

"Non-contaminant?"

Yes, maxquant without MBR gets about 2-3000 non-contaminant sample-specific-peptides per sample, so i guess there are phospho-peptides in the sample which DIA-data doesn't?

vdemichev commented 3 weeks ago

well none actually... nothing from the sample?

Strange. So if you take, say, the main (.parquet) report and check the 5k+ precursors detected there confidently as phosphorylated, they all come from cRAP?

animesh commented 3 weeks ago

Oops @vdemichev i didn't dig into the parquet file, was just looking at phosphosites.tsv files! Now started with parquet and looks like https://colab.research.google.com/drive/15JwUgvH3j8892FgZGkP5kPEonefldTum?usp=sharing about 9K rows are with phopho-sites, not sure wy only cRAP ones made into the phosphositestsv table? How should one go from parquet rows to samples columns, pivot on sample names and sum intensity?

vdemichev commented 3 weeks ago

Could you please maybe share the (i) log (as saved by DIA-NN), (ii) phosphosites_90.tsv and (iii) the main .parquet report?

animesh commented 3 weeks ago

Sure @vdemichev , loading uo phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.phosphosites_90.tsv at https://server-drive.promec.sigma2.no/Data/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.phosphosites_90.tsv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=promecshare%2F20241026%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241026T181251Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=23a87e36622f17a03b785245f88f5f2129b6ccf0be9ebeaae1e48f32cd8d0754 , phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.log.txt https://server-drive.promec.sigma2.no/Data/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.log.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=promecshare%2F20241026%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241026T181433Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d6c6808bc34df5cb4695733b5496e65151316c5d4e0c911e9c7015231e187194 and parquet is already at https://server-drive.promec.sigma2.no/Data/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=promecshare%2F20241026%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241026T162932Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=fa6f46083767594ed1b93e4a9cc78c3491c044b07fb3152d8f6edf24d86333e5 🤞

vdemichev commented 2 weeks ago

Hi Ani,

Figured it out, many thanks for sharing. You only have 'Contaminants' specified but not the main FASTA database. The phosphosite matrices are quantification matrices where each row corresponds to protein + site position. So since there's no protein annotation for non-cRAP proteins, they cannot be included in those matrices.

Btw, also

Best, Vadim

animesh commented 2 weeks ago

Awesome @vdemichev , so this is because i used --fasta "F:\promec\FastaDB\UP000005640_9606.fasta" instead of copying UP000005640_9606.fasta to DIA-NN folder and changing --fasta "UP000005640_9606.fasta" ? Also add --reanalyse for MBR, something like

./diann-linux --threads 20 --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_100ug_zoom20_1dia_25pepsep_S1-A2_1_8449.d" --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_200ug_zoom20_1dia_25pepsep_S1-A5_1_8450.d" --f "/cluster/projects/nn9036k/scripts/phoSTY/dia/zr_IMAC_300ug_zoom20_1dia_25pepsep_S1-A8_1_8451.d" --lib "/cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclib"  --verbose 1 --out "/cluster/projects/nn9036k/FastaDB/phoslibMC1V3mz100to1700c2to3human.predicted.speclibreport.tsv" --qvalue 0.01 --matrices  --min-corr 2.0 --corr-diff 1.0 --time-corr-only --extracted-ms1 --predictor --fasta camprotR_240512_cRAP_20190401_full_tags.fasta --fasta UP000005640_9606.fasta --cont-quant-exclude cRAP- --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --var-mod UniMod:21,79.966331,STY --mass-acc 20.0 --mass-acc-ms1 20.0 --peptidoforms  --reanalyse

is that all or something is missing?

vdemichev commented 2 weeks ago

I the log that you have shared there's only --fasta camprotR_240512_cRAP_20190401_full_tags.fasta, i.e. no user-provided FASTA was specified.

is that all or something is missing?

Please just check that it matches the command formed by the GUI

animesh commented 2 weeks ago

Thanks @vdemichev for amazing patience 🙏 Yes, it seems like that was the issue as now i got about 7000 phospho-sites at the 99.tsv 💯