vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
278 stars 54 forks source link

N-terminal acetylation-library-free search #58

Closed msmedus closed 3 years ago

msmedus commented 4 years ago

Dear Vadim,

I would like to allow N-terminal acetylation in my library-free searches. I tried to enter the respective var mod in the additional commands field (Unimod:1) with lower-case amino acid letters and inserting all 20 amino acids. For some reason, the software stops the calculations after loading the fasta (before actually processing it) without any error message. Any clue what this could mean? thank you!

best, Martin

vdemichev commented 4 years ago

Hi Martin,

Please try --var-mod UniMod:1,42.010565,gavlifmpwsctyhkrqend Please note that deep learning spectra/RT prediction will need to be turned off. Seems to work on my machine. If it does not, please share the full log (by clicking "Save log") and the FASTA file, I will then try to reproduce the error.

Best wishes,

Vadim

vdemichev commented 4 years ago

Btw, DIA-NN will try to add N-terminal acetylation on every single peptide, regardless of whether it's at the N-terminus of the protein. So might need to filter the final analysis.

msmedus commented 4 years ago

Dear Vadim,

Thank you very much for your response! I thought I had tried what you suggested. In any case I can try once more and give you the log file in case it doesn't work. What is of course not ideal is that the software tries to add an N-term acetylation to each peptide. I guess the search space would be blown up a lot in this way and this would negatively affect IDs? On the other hand, if one is interested in analysing N-terminal PTMs (which is the case here, with ubiquitnations occuring on the very N-terminus), they are not identified with DIA-NN if acetylation of protein N-termini is disabled. I tried to process the data with Spectronaut and the modified peptide is detected..

best, Martin

vdemichev commented 4 years ago

Hi Martin,

Yes, it's suboptimal that it tried to acetylate all the peptides. This can lead to up to 2x higher FDR at the same number of IDs, in comparison to just considering protein-N-term peptides. There are two partial solutions to this. First, filter out all acetylations identified on non-N-terminal peptides and accept all other peptides (including unmodified ones) at < 2% q-value instead of < 1% q-value. Second (which I would recommend), make a FASTA out of all protein-N-term peptides and let DIA-NN analyse using this FASTA only.

Btw, all this is an issue for library-free search only. You can also make a spectral library in Spectronaut (if it seems to work better) or any other tool and then use it with DIA-NN.

Best wishes,

Vadim

msmedus commented 4 years ago

Thanks Vadim! I prefer DIA-NN for spectral library generation as it works better than SN in my hands :-) I think I would go for the second option in this case. As I mostly do lib-free searches, I would probably generate two spectral libraries first (one with only N-terminally modified peptides and another one with the entire FASTA but excluding N-terminal acetylation), combine the spectral libaries and use the new spec lib in a regular library-based search. Let's see..

thanks again and best wishes, Martin

msmedus commented 4 years ago

Dear Vadim,

I send you the log, as discussed. I keep encountering the same problem, namely that the software stops all calculations after loading the fasta. Would be great if you could have a look at it.

Thanks, Martin

msmedus commented 4 years ago

Attaching the file did not work so I copy/paste the entire log..

diann.exe --f "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\raw files\CM_M04500Bb.raw.dia" --f "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\raw files\CM_M04501Bb.raw.dia" --f "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\raw files\CM_M04502Bb.raw.dia" --f "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\raw files\CM_M04503Bb.raw.dia" --lib "" --threads 48 --verbose 3 --out "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_Report.tsv" --out-gene "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_Report.genes.tsv" --qvalue 0.01 --out-lib "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_library.tsv" --gen-spec-lib --fasta "L:\Staff\MSt\FASTA\Nterm_trypticPep_7aaMin_swissprot_human_2018_10_uniprot_header.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut-after KR --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --unimod4 --var-mods 2 --unimod35 --peak-center --no-ifs-removal --mass-acc-cal 30 --var-mod UniMod:121,114.042927,K --learn-lib "G:\DIA-NN settings\Ubiquitinomics\Training_library_ubi.speclib" --var-mod UniMod:1,42.010565,mastcvg DIA-NN 1.7.10 (Data Independent Acquisition by Neural Networks) Compiled on Apr 2 2020 20:57:10 Current date and time: Mon Oct 12 11:06:35 2020 CPU: GenuineIntel Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2 Logical CPU cores: 48 Thread number set to 48 Output will be filtered at 0.01 FDR A spectral library will be generated Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 N-terminal methionine excision enabled In silico digest will include cuts after amino acids: KR Maximum number of missed cleavages set to 1 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 2 Methionine oxidation enabled as a variable modification Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Calibration mass accuracy set to 3e-05 Modification UniMod:121 with mass delta 114.043 at K will be considered as variable Modification UniMod:1 with mass delta 42.0106 at mastcvg will be considered as variable Exclusion of fragments shared between heavy and light peptides from quantification is not supported in library-free mode - disabled

4 files will be processed [0:00] Loading spectral library G:\DIA-NN settings\Ubiquitinomics\Training_library_ubi.speclib [0:00] Library annotated with sequence database(s): L:\Staff\MSt\uniprot_taxonomy_Homo_sapiens_Human_9606_filtered_reviewed_with_isoforms.fasta [0:00] Spectral library loaded: 18568 protein isoforms, 11311 protein groups and 56616 precursors in 47263 elution groups. [0:00] Learning peptide characteristics [1:01] y-series fragmentation prediction: ratio SD = 0.611136, Pearson correlation = 0.691942 average, 0.775228 median [1:01] iRT prediction: median error = 2.23386 [1:01] Loading FASTA L:\Staff\MSt\FASTA\Nterm_trypticPep_7aaMin_swissprot_human_2018_10_uniprot_header.fasta

DIA-NN exited DIA-NN-plotter.exe "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_Report.stats.tsv" "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_Report.tsv" "G:\Development validation - DV\CM_M04496B_DV_ubi DDA vs DIA\processings\libFree_training_NtermAcet\CM_M04496Bb_LF_training_NtermAcet_Report.pdf" PDF report will be generated in the background

vdemichev commented 4 years ago

Hi Martin,

Can you please email me the FASTA? I will then find out what's the problem, such things are typically very easy to troubleshoot. Most likely the issue is that the FASTA headers are not fully uniprot-style.

Vadim

msmedus commented 4 years ago

Nterm_trypticPep_7aaMin_swissprot_human_2018_10_uniprot_header.zip

Hi Vadim,

Here is the FASTA. It is modified to contain only peptides from protein N-termini (with 1 missed cleavage)..

thanks, Martin

vdemichev commented 4 years ago

OK, works fine on my machine using DIA-NN 1.7.12. Can you please try 1.7.12, maybe it will solve the problem? If it does not, please also share Training_library_ubi.speclib, will need it to fully reproduce what's happening.

Vadim

msmedus commented 4 years ago

Training_library_ubi.zip

Here it is..I tried to run it without training lib but the problem persists. I will install the newest version in the meantime..

msmedus commented 4 years ago

Dear Vadim,

Just to let you know, I have done some further investigations and the problem does not depend on the training library or the software version. In my case, also 1.7.12 gives the same problem..

best wishes, Martin

vdemichev commented 4 years ago

OK, managed to reproduce. Without methionine oxidation it works fine, I will take a look what's the problem with methionine oxidation.

Update: seems there's some old code which was written when I was just starting implementing variable modifications. Looks like this code prevents DIA-NN from considering more than two types of variable modifications. In this case it's 3, thus the error. Please disable methionine oxidation, and then it will work fine. I will fix the issue in the next DIA-NN version. Many thanks for finding this bug! If you wish, I can send you the fixed version before the next release, for this contact me by email please.

Vadim

msmedus commented 4 years ago

Thanks Vadim! I will try the search with 2 var mods and see how many N-terminally acetylated and K-GG modified peptides I gain. Would be cool if the software added acetyl groups to protein N-termini only, this would make things much easier. Maybe in the next version ;-)

thanks again! Martin

luizalmeida93 commented 1 year ago

I am considering adding Ox(M) and Ac(N-term) to my search as we normally add them when running DDA by Maxquant. In this post, @vdemichev stated that "deep learning spectra/RT prediction will need to be turned off" if acetylation is included. However, as of Jun 2023, the manual says that M(ox) and N-term acetyl are supported by the deep learning predictor. Thus, I can safely assume that these posts from 2020 are outdated and that I can include Ox and Ac in my search, correct?