single protein-phosphorylaation

haiyan-MS commented 1 year ago

Hi

I am just starting to use DIA NN. It has been producing great results for us when we did cell lysate. I recently did a project with in vitro phosphorylation. I have 4 samples. A and B were mostly a human protein with a human kinease (minute amount) but some residual e.coli proteins. C and D are mostly E.coli protein with very few human protein and kinase (minute amount). I did DIA and searched using DIA NN 1.8.1. I used library free search with E.coli database and a small database with two sequences (human substrate protein and human kinase sequences) with dismal results. In DDA and base peaks, I can see this human protein clearly and covarage should be >90%. But in DIA NN, A and B produced about 10% coverage of the human substrate. Manual inspection of the raw files looks fine. I would like to find out where I did wrong.

vdemichev commented 1 year ago

The settings of DIA-NN do not follow the guildeline https://github.com/vdemichev/DiaNN#changing-default-settings, however this would not affect the results significantly in this case. Can you please share the full log and screenshots demonstrating what you would expect to see but don't see in DIA-NN output?

haiyan-MS commented 1 year ago

Dear Professor Demichev:

Thank you so much for this wonderful software. We had tried many proteomics projects. DIA NN always produce good results. But for this in vitro experiment, it was disappointing. I am concerned about the 1st comment you have. I did not change anything except checked phosphorylation. Why do you say it did not follow guideline. I attached pipeline (not sure where the pepline was saved so I copy pasted into notepad) and the human protein sequences I used for searching. I am also sending screenshot of basepeak comparison between DIA and DDA for one of the samples and the MSMS of several peptides that were not identified in DIA and compared to DDA (all identified). In DDA, the human substrate OPA1 is identified and coverage is >90%. But in DIA I would say only about 15 to 20%.

Let me know what else you need to find out why I can't identify any phosphorylation and why the coverage for OPA1 is so low in DIA run.

To reiterate, this is invitro experiment of a human substrate overexpressed in Ecoli and purifed plus commercial human kinase to study in vitro phosphorylation.I searched e.coli fasta database plus a text file containing the two human protein sequences that were downloaded from Uniprot.

Thank you again for taking the time to look into this.

Haiyan

[cid:51d34b5a-f628-4934-9104-988997aca1e0] [cid:61b56347-d290-41dd-9767-ad226129fd53] [cid:6a68a1f4-a9ba-4d80-afb9-81cbea25cfae]

From: Vadim Demichev @.> Sent: Wednesday, March 22, 2023 7:50 AM To: vdemichev/DiaNN @.> Cc: Haiyan Zheng @.>; Author @.> Subject: Re: [vdemichev/DiaNN] single protein-phosphorylaation (Issue #640)

The settings of DIA-NN do not follow the guildeline https://github.com/vdemichev/DiaNN#changing-default-settings https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvdemichev%2FDiaNN%23changing-default-settings&data=05%7C01%7Chaiyanz%40cabm.rutgers.edu%7C64a85ac21e904a5692fd08db2acb988c%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638150826149255045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4NpOT4Y3T3M0fN3LR%2Fyvr2ZTn31PTRNx7ilDXWY1%2FV8%3D&reserved=0, however this would not affect the results significantly in this case. Can you please share the full log and screenshots demonstrating what you would expect to see but don't see in DIA-NN output?

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvdemichev%2FDiaNN%2Fissues%2F640%23issuecomment-1479415188&data=05%7C01%7Chaiyanz%40cabm.rutgers.edu%7C64a85ac21e904a5692fd08db2acb988c%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638150826149255045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nyJO%2B8rmvnEsYXZPsZI9G208qRrdtEz7iJxp6qBSvw8%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAVX4X3YQVTCNB4QBVRRDXETW5LRPJANCNFSM6AAAAAAWBESB3M&data=05%7C01%7Chaiyanz%40cabm.rutgers.edu%7C64a85ac21e904a5692fd08db2acb988c%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638150826149255045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RCwmwzG6ugDjdeeifCVlGA8kl9pZ7FSUTEvbNnxCpyk%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

diann.exe --f "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw " --f "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw " --f "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05597-DIA.raw " --f "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05598-DIA.raw " --lib "" --threads 8 --verbose 1 --out "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.tsv" --qvalue 0.01 --predictor --fasta "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.txt" --fasta "S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --var-mod UniMod:21,79.966331,STY --monitor-mod UniMod:21 --use-quant --double-search --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Thu Mar 16 15:24:13 2023 CPU: GenuineIntel Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 Logical CPU cores: 8 Thread number set to 8 Output will be filtered at 0.01 FDR Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 N-terminal methionine excision enabled In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 1 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Min precursor charge set to 1 Max precursor charge set to 4 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable Existing .quant files will be used Neural networks will be used for peak selection A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme. Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library The following variable modifications will be scored: UniMod:21 WARNING: double-pass mode is incompatible with PTM scoring, turned off

4 files will be processed [0:00] Loading FASTA S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.txt [0:00] Loading FASTA S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta [0:00] Processing FASTA [0:03] Assembling elution groups [0:05] 1714828 precursors generated [0:05] Gene names missing for some isoforms [0:05] Library contains 1665 proteins, and 1665 genes [0:06] Encoding peptides for spectra and RTs prediction [0:08] Predicting spectra and IMs [11:12] Predicting RTs [12:04] Decoding predicted spectra and IMs [12:10] Decoding RTs [12:10] Saving the library to lib.predicted.speclib Could not save lib.predicted.speclib [12:12] Initialising library

[12:13] First pass: generating a spectral library from DIA data [12:13] Cross-run analysis [12:13] Reading quantification information: 4 files [12:13] Quantifying peptides [12:13] Assembling protein groups [12:14] Quantifying proteins [12:14] Calculating q-values for protein and gene groups [12:14] Calculating global q-values for protein and gene groups [12:14] Writing report [12:15] Report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2-first-pass.tsv. [12:15] Stats report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2-first-pass.stats.tsv [12:15] Generating spectral library: [12:15] 1198 precursors passing the FDR threshold are to be extracted [12:15] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw [12:37] 1235362 library precursors are potentially detectable [12:37] 59 spectra added to the library [12:38] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05597-DIA.raw [13:00] 1235362 library precursors are potentially detectable [13:00] 124 spectra added to the library [13:00] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05598-DIA.raw [13:21] 1235362 library precursors are potentially detectable [13:22] 158 spectra added to the library [13:22] Saving spectral library to lib.tsv ERROR: cannot write to lib.tsv. Check if the destination folder is write-protected or the file is in use [13:22] Loading the generated library and saving it in the .speclib format [13:22] Loading spectral library lib.tsv cannot read the file [13:22] Loading protein annotations from FASTA S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.txt [13:22] Loading protein annotations from FASTA S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta [13:22] Library contains 0 proteins, and 0 genes [13:22] Saving the library to lib.tsv.speclib Could not save lib.tsv.speclib

[13:23] Second pass: using the newly created spectral library to reanalyse the data [13:23] File #1/4 [13:23] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw [13:44] 1198 library precursors are potentially detectable [13:44] Processing... [13:45] RT window set to 0.756754 [13:45] Peak width: 0 [13:45] Scan window radius set to 5 [13:45] Recommended MS1 mass accuracy setting: 5.09486 ppm [13:48] Optimised mass accuracy: 2.67272 ppm [13:49] Removing low confidence identifications [13:49] Searching PTM decoys [13:49] Removing interfering precursors [13:49] Too few confident identifications, neural networks will not be used [13:49] Number of IDs at 0.01 FDR: 25 [13:49] Number of IDs at 0.01 FDR: 25 [13:49] Calculating protein q-values [13:49] Number of genes identified at 1% FDR: 2 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [13:49] Quantification

[13:49] File #2/4 [13:49] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw [14:13] 1198 library precursors are potentially detectable [14:13] Processing... [14:14] RT window set to 0.80931 [14:14] Recommended MS1 mass accuracy setting: 4.04922 ppm [14:14] Removing low confidence identifications [14:14] Searching PTM decoys [14:14] Removing interfering precursors [14:14] Too few confident identifications, neural networks will not be used [14:14] Number of IDs at 0.01 FDR: 106 [14:14] Number of IDs at 0.01 FDR: 101 [14:14] Calculating protein q-values [14:14] Number of genes identified at 1% FDR: 25 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [14:14] Quantification

[14:14] File #3/4 [14:14] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05597-DIA.raw [14:36] 1198 library precursors are potentially detectable [14:36] Processing... [14:37] RT window set to 0.919098 [14:37] Recommended MS1 mass accuracy setting: 3.97097 ppm [14:37] Removing low confidence identifications [14:37] Searching PTM decoys [14:37] Removing interfering precursors [14:37] Training neural networks: 1014 targets, 107 decoys [14:38] Number of IDs at 0.01 FDR: 1008 [14:38] Calculating protein q-values [14:38] Number of genes identified at 1% FDR: 322 (precursor-level), 213 (protein-level) (inference performed using proteotypic peptides only) [14:38] Quantification

[14:38] File #4/4 [14:38] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05598-DIA.raw [14:59] 1198 library precursors are potentially detectable [14:59] Processing... [15:00] RT window set to 0.892467 [15:00] Recommended MS1 mass accuracy setting: 4.18982 ppm [15:00] Removing low confidence identifications [15:00] Searching PTM decoys [15:00] Removing interfering precursors [15:00] Too few confident identifications, neural networks will not be used [15:00] Number of IDs at 0.01 FDR: 866 [15:00] Number of IDs at 0.01 FDR: 972 [15:00] Calculating protein q-values [15:00] Number of genes identified at 1% FDR: 318 (precursor-level), 191 (protein-level) (inference performed using proteotypic peptides only) [15:00] Quantification

[15:00] Cross-run analysis [15:00] Reading quantification information: 4 files [15:00] Quantifying peptides [15:00] Quantifying proteins [15:00] Calculating q-values for protein and gene groups [15:00] Calculating global q-values for protein and gene groups [15:00] Writing report [15:00] Report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.tsv. [15:00] Stats report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.stats.tsv [15:00] Log saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.log.txt Finished

DIA-NN exited DIA-NN-plotter.exe "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.stats.tsv" "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.tsv" "S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA\report-P-2.pdf" PDF report will be generated in the background

sp|Q9UHD2|TBK1_HUMAN Serine/threonine-protein kinase TBK1 OS=Homo sapiens OX=9606 GN=TBK1 PE=1 SV=1 MQSTSNHLWLLSDILGQGATANVFRGRHKKTGDLFAIKVFNNISFLRPVDVQMREFEVLK KLNHKNIVKLFAIEEETTTRHKVLIMEFCPCGSLYTVLEEPSNAYGLPESEFLIVLRDVV GGMNHLRENGIVHRDIKPGNIMRVIGEDGQSVYKLTDFGAARELEDDEQFVSLYGTEEYL HPDMYERAVLRKDHQKKYGATVDLWSIGVTFYHAATGSLPFRPFEGPRRNKEVMYKIITG KPSGAISGVQKAENGPIDWSGDMPVSCSLSRGLQVLLTPVLANILEADQEKCWGFDQFFA ETSDILHRMVIHVFSLQQMTAHKIYIHSYNTATIFHELVYKQTKIISSNQELIYEGRRLV LEPGRLAQHFPKTTEENPIFVVSREPLNTIGLIYEKISLPKVHPRYDLDGDASMAKAITG VVCYACRIASTLLLYQELMRKGIRWLIELIKDDYNETVHKKTEVVITLDFCIRNIEKTVK VYEKLMKINLEAAELGEISDIHTKLLRLSSSQGTIETSLQDIDSRLSPGGSLADAWAHQE GTHPKDRNVEKLQVLLNCMTEIYYQFKKDKAERRLAYNEEQIHKFDKQKLYYHATKAMTH FTDECVKKYEAFLNKSEEWIRKMLHLRKQLLSLTNQCFDIEEEVSKYQEYTNELQETLPQ KMFTASSGIKHTMTPIYPSSNTLVEMTLGMKKLKEEMEGVVKELAENNHILERFGSLTMD GGLRNVDCL sp|O60313|OPA1_HUMAN Dynamin-like 120 kDa protein, mitochondrial OS=Homo sapiens OX=9606 GN=OPA1 PE=1 SV=3 MWRLRRAAVACEVCQSLVKHSSGIKGSLPLQKLHLVSRSIYHSHHPTLKLQRPQLRTSFQ QFSSLTNLPLRKLKFSPIKYGYQPRRNFWPARLATRLLKLRYLILGSAVGGGYTAKKTFD QWKDMIPDLSEYKWIVPDIVWEIDEYIDFEKIRKALPSSEDLVKLAPDFDKIVESLSLLK DFFTSGSPEETAFRATDRGSESDKHFRKVSDKEKIDQLQEELLHTQLKYQRILERLEKEN KELRKLVLQKDDKGIHHRKLKKSLIDMYSEVLDVLSDYDASYNTQDHLPRVVVVGDQSAG KTSVLEMIAQARIFPRGSGEMMTRSPVKVTLSEGPHHVALFKDSSREFDLTKEEDLAALR HEIELRMRKNVKEGCTVSPETISLNVKGPGLQRMVLVDLPGVINTVTSGMAPDTKETIFS ISKAYMQNPNAIILCIQDGSVDAERSIVTDLVSQMDPHGRRTIFVLTKVDLAEKNVASPS RIQQIIEGKLFPMKALGYFAVVTGKGNSSESIEAIREYEEEFFQNSKLLKTSMLKAHQVT TRNLSLAVSDCFWKMVRESVEQQADSFKATRFNLETEWKNNYPRLRELDRNELFEKAKNE ILDEVISLSQVTPKHWEEILQQSLWERVSTHVIENIYLPAAQTMNSGTFNTTVDIKLKQW TDKQLPNKAVEVAWETLQEEFSRFMTEPKGKEHDDIFDKLKEAVKEESIKRHKWNDFAED SLRVIQHNALEDRSISDKQQWDAAIYFMEEALQARLKDTENAIENMVGPDWKKRWLYWKN RTQEQCVHNETKNELEKMLKCNEEHPAYLASDEITTVRKNLESRGVEVDPSLIKDTWHQV YRRHFLKTALNHCNLCRRGFYYYQRHFVDSELECNDVVLFWRIQRMLAITANTLRQQLTN TEVRRLEKNVKEVLEDFAEDGEKKIKLLTGKRVQLAEDLKKVREIQEKLDAFIEALHQEK

vdemichev commented 1 year ago

Hi Haiyan,

Ox(M) and double-pass mode are not recommended. For most phospho workflows, setting the number of variable modifications to 3 is sensible.

The log attached is not consistent with the screenshot on the first post. That is those were obtained with different settings. Specifically, the screenshot shows analysis log obtained without 'Phospho' selected. So when you refer to DIA-NN output, which way it was obtained? I would suggest to reanalyse everything without reusing .quant files, to make sure all the settings are good.

Best, Vadim

haiyan-MS commented 1 year ago

I did what you suggested. This time, I saved my sequence file that contain 2 human protein as fasta instead of txt. Used this fasta plus E.coli fasta And used single pass mode, without Ox(M) but with phospho selected. I only analyzed two DIA runs instead of 4 like last time. That are all the difference from last time. This time, I might get more peptide identified, but still much less comparing to DDA runs and the examples I showed still did not identify even if they are basepeaks. In DDA runs, I got about 360 peptide of OPA1 identified, but here I only got about 70 peptide of OPA1. I used the report as DIA NN output results and analyzed that.

Please see attached log file. Also added report pdf file not sure if it is useful for you.

Haiyan

From: Vadim Demichev @.> Sent: Wednesday, March 22, 2023 9:33 AM To: vdemichev/DiaNN @.> Cc: Haiyan Zheng @.>; Author @.> Subject: Re: [vdemichev/DiaNN] single protein-phosphorylaation (Issue #640)

Hi Haiyan,

Ox(M) and double-pass mode are not recommended. For most phospho workflows, setting the number of variable modifications to 3 is sensible.

The log attached is not consistent with the screenshot on the first post. That is those were obtained with different settings. Specifically, the screenshot shows analysis log obtained without 'Phospho' selected. So when you refer to DIA-NN output, which way it was obtained? I would suggest to reanalyse everything without reusing .quant files, to make sure all the settings are good.

Best, Vadim

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvdemichev%2FDiaNN%2Fissues%2F640%23issuecomment-1479575747&data=05%7C01%7Chaiyanz%40cabm.rutgers.edu%7C1cb7f1f24529472ff3b308db2ada0537%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638150888117171632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jMR7YgAZW926tKsATx0zqyswVtkRSiSc49GfONEWqLE%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAVX4X37JT3PXO4QYEYCWZP3W5L5SPANCNFSM6AAAAAAWBESB3M&data=05%7C01%7Chaiyanz%40cabm.rutgers.edu%7C1cb7f1f24529472ff3b308db2ada0537%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638150888117327873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RT5HM2jvYGxM4YOpoPdGA6vLNMMh0tVw2pwg9TWYfbM%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Wed Mar 22 09:55:35 2023 CPU: GenuineIntel Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 Logical CPU cores: 8 diann.exe --f S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw --f S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw --lib --threads 8 --verbose 1 --out S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.tsv --qvalue 0.01 --matrices --out-lib C:\Program Files (x86)\DIA-NN\report-lib.tsv --gen-spec-lib --predictor --fasta S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.fasta --fasta S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 3 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 1 --var-mod UniMod:21,79.966331,STY --monitor-mod UniMod:21 --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal

Thread number set to 8 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 N-terminal methionine excision enabled In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 3 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Min precursor charge set to 1 Max precursor charge set to 4 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 1 Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme. Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library The following variable modifications will be scored: UniMod:21

2 files will be processed [0:00] Loading FASTA S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.fasta [0:00] Loading FASTA S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta [0:00] Processing FASTA [0:06] Assembling elution groups [0:10] 2849370 precursors generated [0:10] Gene names missing for some isoforms [0:10] Library contains 1665 proteins, and 1665 genes [0:10] [0:14] [18:22] [20:08] [20:18] [20:20] Saving the library to C:\Program Files (x86)\DIA-NN\report-lib.predicted.speclib Could not save C:\Program Files (x86)\DIA-NN\report-lib.predicted.speclib [20:23] Initialising library

[20:24] First pass: generating a spectral library from DIA data [20:24] File #1/2 [20:24] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw [20:44] 2062079 library precursors are potentially detectable [20:45] Processing... [34:20] RT window set to 4.94332 [34:20] Peak width: 2.376 [34:20] Scan window radius set to 5 [34:21] Recommended MS1 mass accuracy setting: 4.92978 ppm [41:55] Optimised mass accuracy: 4.54686 ppm [43:27] Removing low confidence identifications [43:28] Searching PTM decoys [43:46] Removing interfering precursors [43:48] Training neural networks: 75749 targets, 145448 decoys [43:59] Number of IDs at 0.01 FDR: 414 [43:59] Calculating protein q-values [43:59] Number of genes identified at 1% FDR: 39 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [43:59] Quantification [44:00] Precursors with monitored PTMs at 1% FDR: 0 out of 33 [44:00] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 267 [44:00] Precursors with PTMs localised (when required) with > 90% confidence: 0 out of 0 [44:00] Quantification information saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw.quant.

[44:00] File #2/2 [44:00] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw [44:23] 2062079 library precursors are potentially detectable [44:23] Processing... [57:42] RT window set to 6.0476 [57:42] Recommended MS1 mass accuracy setting: 3.8579 ppm [60:10] Removing low confidence identifications [60:10] Searching PTM decoys [60:35] Removing interfering precursors [60:36] Training neural networks: 1778 targets, 1139 decoys [60:36] Number of IDs at 0.01 FDR: 762 [60:36] Calculating protein q-values [60:37] Number of genes identified at 1% FDR: 76 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [60:37] Quantification [60:37] Precursors with monitored PTMs at 1% FDR: 0 out of 147 [60:37] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 357 [60:37] Precursors with PTMs localised (when required) with > 90% confidence: 0 out of 0 [60:37] Quantification information saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw.quant.

[60:37] Cross-run analysis [60:37] Reading quantification information: 2 files [60:37] Quantifying peptides [60:38] Assembling protein groups [60:39] Quantifying proteins [60:39] Calculating q-values for protein and gene groups [60:39] Calculating global q-values for protein and gene groups [60:39] Writing report [60:39] Report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.tsv. [60:39] Saving precursor levels matrix [60:39] Precursor levels matrix (1% precursor and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.pr_matrix.tsv. [60:39] Saving protein group levels matrix [60:39] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.pg_matrix.tsv. [60:39] Saving gene group levels matrix [60:39] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.gg_matrix.tsv. [60:39] Saving unique genes levels matrix [60:39] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.unique_genes_matrix.tsv. [60:39] Stats report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018-first-pass.stats.tsv [60:39] Generating spectral library: [60:39] 121 precursors passing the FDR threshold are to be extracted [60:39] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw [61:01] 2062079 library precursors are potentially detectable [61:02] 61 spectra added to the library [61:02] Saving spectral library to C:\Program Files (x86)\DIA-NN\report-lib.tsv ERROR: cannot write to C:\Program Files (x86)\DIA-NN\report-lib.tsv. Check if the destination folder is write-protected or the file is in use [61:02] Loading the generated library and saving it in the .speclib format [61:02] Loading spectral library C:\Program Files (x86)\DIA-NN\report-lib.tsv cannot read the file [61:02] Loading protein annotations from FASTA S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\OPA1_BPK.fasta [61:02] Loading protein annotations from FASTA S:\General\Genome\E.coli\NCBI_Ecoli_K12_MG1655\GCF_000005845.2_ASM584v2_protein.fasta [61:02] Library contains 0 proteins, and 0 genes [61:02] Saving the library to C:\Program Files (x86)\DIA-NN\report-lib.tsv.speclib Could not save C:\Program Files (x86)\DIA-NN\report-lib.tsv.speclib

[61:04] Second pass: using the newly created spectral library to reanalyse the data [61:04] File #1/2 [61:04] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05595-DIA.raw [61:25] 121 library precursors are potentially detectable [61:25] Processing... [61:25] RT window set to 0.959165 [61:25] Recommended MS1 mass accuracy setting: 5.08166 ppm [61:25] Removing low confidence identifications [61:25] Searching PTM decoys [61:25] Removing interfering precursors [61:25] Too few confident identifications, neural networks will not be used [61:25] Number of IDs at 0.01 FDR: 0 [61:25] Calculating protein q-values [61:25] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [61:25] Quantification

[61:25] File #2/2 [61:25] Loading run S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\raw\ECL05596-DIA.raw [61:46] 121 library precursors are potentially detectable [61:46] Processing... [61:46] RT window set to 0.857049 [61:46] Recommended MS1 mass accuracy setting: 3.97759 ppm [61:46] Removing low confidence identifications [61:46] Searching PTM decoys [61:46] Removing interfering precursors [61:46] Too few confident identifications, neural networks will not be used [61:46] Number of IDs at 0.01 FDR: 116 [61:46] Calculating protein q-values [61:46] Number of genes identified at 1% FDR: 22 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [61:46] Quantification

[61:47] Cross-run analysis [61:47] Reading quantification information: 2 files [61:47] Quantifying peptides [61:47] Quantifying proteins [61:47] Calculating q-values for protein and gene groups [61:47] Calculating global q-values for protein and gene groups [61:47] Writing report [61:47] Report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.tsv. [61:47] Saving precursor levels matrix [61:47] Precursor levels matrix (1% precursor and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.pr_matrix.tsv. [61:47] Saving protein group levels matrix [61:47] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.pg_matrix.tsv. [61:47] Saving gene group levels matrix [61:47] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.gg_matrix.tsv. [61:47] Saving unique genes levels matrix [61:47] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.unique_genes_matrix.tsv. [61:47] Stats report saved to S:\Projects\finished_projects\2023\March\MS6015_Yubing Lu\DIA-2\report-DIA_MS6018.stats.tsv

Finished

vdemichev commented 1 year ago

Can you please attach on github? Not so convenient via email. How were you getting ~10k IDs on a screenshot in the initial post, i.e. why more IDs then? Based on the log, looks like only few if any peptides are phosphorylated. Does the first-pass report contain the peptides of interest?

haiyan-MS commented 1 year ago

report-DIA_MS6018.log.txt report-DIA_MS6018.pdf

Please let me know if this is better.

There might be a little more ID, but not much more. I guess single pass instead of double? No M oxi? I only searched two runs instead of 4 runs?

vdemichev commented 1 year ago

"There might be a little more ID, but not much more" - but >9k here?

haiyan-MS commented 1 year ago

I just looked the 1st pass report. THe coverage is 89% and there are tons of phosphorylation! I will have a closer look of this file. Do you suggest I use this one instead of the final report?

vdemichev commented 1 year ago

OK, I think I understand, those other are another type of sample?

So what I would suggest:

Prepare a separate .predicted.speclib for human protein & E.coli. Only use phospho as a variable modification for human (as I understand, the E.coli one is not supposed to be phosphorylated, even if it was, need to declare phosphate on different AAs).
Convert both to .tsv.
Load them both in DIA-NN using --lib command twice.
Check the 'Phospho' checkbox.
Analyse with MBR enabled. If results still not satisfactory, (i) check if you see the peptides you are interested in in the first-pass main report; (ii) if yes, analyse with MBR but FDR filter relaxed, e.g. 3% or 5%.

Best, Vadim

haiyan-MS commented 1 year ago

Do you have ways to just generate library without searching the raw files?

haiyan-MS commented 1 year ago

I have tried this search with generated library. But somehow, I don't know why it only take one lib but not two. Also, I have a question, when using this library, do I deselect all the modifications in the default? Do I still select phosphorylation or not? Please have a look of my search log. And also my libraray I generated with the two human proteins. report-DIA_libsearch.log.txt

vdemichev commented 1 year ago

Predicted libraries need to be converted to .tsv first.

vdemichev commented 1 year ago

Please see above: 'Check the 'Phospho' checkbox.'

haiyan-MS commented 1 year ago

I tred to convert spelib file to .txt file, but not successfully. Also, It seems the software only take the 1st library although it acknowledges there are two librarys.

vdemichev commented 1 year ago

It always converts :) Log & setting screenshot?

haiyan-MS commented 1 year ago

report-lib-4.log.txt This time there is 0 results

vdemichev commented 1 year ago

Yes, need to convert .predicted.speclib libraries into .tsv, please see above.

vdemichev commented 1 year ago

For this, just specify the .predicted.speclib in the Spectral Library field, and select 'Generate spectral library'. The Raw files field must be empty.

haiyan-MS commented 1 year ago

OPA1.log.txt I don't see .tsv although I said generate OPA1.tsv

vdemichev commented 1 year ago

Yes, now need to 1. Reset all settings. 2. put OPA1.predicted.speclib into the 'Spectral library' field. 3. Check 'Generate spectral library'.

haiyan-MS commented 1 year ago

Still the samething. only the 1st lib was used. report-again.log.txt I accidentally loaded all 4 runs. This time, only the last two files give results. But the basic issue is it only used 1st lib.

vdemichev commented 1 year ago

Of course, because the libs are not converted to .tsv

vdemichev commented 1 year ago

DIA-NN is very clear about this: WARNING: multiple spectral libraries are specified; this mode is experimental, and DIA-NN does not check if different libraries are in the same format, have consistent modification names or reference RT scales; all libraries must be in .tsv format

haiyan-MS commented 1 year ago

But this time I converted to tsv

vdemichev commented 1 year ago

I see, the reason is you still have the .predicted.speclib in 'Spectral library' field

haiyan-MS commented 1 year ago

Found the mistake. I still have the E.coli lib selected in the window where you can put in lib file if you want single lib. I will try again.

haiyan-MS commented 1 year ago

Professor Demichev:

Thank you for your help. It finally worked and give me a lot of overage. This issue is resolved. Of course I need to have a look more carefully about them and to see if they make sense. I might come back to ask you.

haiyan-MS commented 1 year ago

I have one question: Is there a way to get observed MS value in the report? Or even therotical value?

vdemichev commented 1 year ago

Theoretical - can just match the preursors to the library entries. Empirical - currently not supported.

haiyan-MS commented 1 year ago

Yes. But how do I get it out? Of course I can caculate individually, but it will be best to have it in the report.

vdemichev / DiaNN

single protein-phosphorylaation #640