vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Questions regarding lactylation modifications in DIANN version 1.9.1 windows #1248

Open ZHBHSMILE opened 2 weeks ago

ZHBHSMILE commented 2 weeks ago

Hi Vadim,

Thank you, as always, for developing such an excellent tool.

I am currently analyzing lactylation and have encountered an issue in DIANN version 1.9.1:

My Questions

  1. Lactylation modifications are frequently detected only once at the end of Modified.Sequence, as shown in the attached image. Our goal is to exclude sequences where lactylation appears solely at the sequence’s end,because of the position after trypsin cleavage, lactylation modifications should not occur. In the advanced options window of the GUI, is there a command to exclude sequences where lactylation appears only at the end? qustion

  2. Expected Count for Lactylation: What is the typical number of Modified.Sequence entries expected to contain lactylation? With 172,893 unique sequences, I am concerned this may be an error. Could you confirm if this quantity is within an expected range? The total unique Modified.Sequence count I obtained is 172,893, which seems unusually high. I suspect there may be an error in my commands:

    data <- read.delim("2024_11_4_SP_Homo_sapiens_report.pr_matrix.tsv", header = TRUE, check.names = FALSE)
    s <- data$Modified.Sequence %>% unique()         # Total unique sequences: 172,893
    s1 <- data$Modified.Sequence[str_detect(data$Modified.Sequence, "2114")]  # Sequences with lactylation: 153,072
    s2 <- s1[(!str_detect(s1, ".*2114.*2114.*")) & str_detect(s1, ".*2114\\)$")] %>% length()  # Lactylation only at sequence end: 34,482

    command in log.tsv:

    diann.exe --f E:\project\12F236DP\F236DP\F236DP-N1.raw  --f E:\project\12F236DP\F236DP\F236DP-N2.raw  --f E:\project\12F236DP\F236DP\F236DP-N3.raw  --f E:\project\12F236DP\F236DP\F236DP-N4.raw  --f E:\project\12F236DP\F236DP\F236DP-N5.raw  --f E:\project\12F236DP\F236DP\F236DP-N6.raw  --f E:\project\12F236DP\F236DP\F236DP-N7.raw  --f E:\project\12F236DP\F236DP\F236DP-N8.raw  --f E:\project\12F236DP\F236DP\F236DP-N9.raw  --f E:\project\12F236DP\F236DP\F236DP-N10.raw  --f E:\project\12F236DP\F236DP\F236DP-N11.raw  --f E:\project\12F236DP\F236DP\F236DP-N12.raw  --f E:\project\12F236DP\F236DP\F236DP-N13.raw  --f E:\project\12F236DP\F236DP\F236DP-N14.raw  --f E:\project\12F236DP\F236DP\F236DP-N15.raw  --f E:\project\12F236DP\F236DP\F236DP-T1.raw  --f E:\project\12F236DP\F236DP\F236DP-T2.raw  --f E:\project\12F236DP\F236DP\F236DP-T3.raw  --f E:\project\12F236DP\F236DP\F236DP-T4.raw  --f E:\project\12F236DP\F236DP\F236DP-T5.raw  --f E:\project\12F236DP\F236DP\F236DP-T6.raw  --f E:\project\12F236DP\F236DP\F236DP-T7.raw  --f E:\project\12F236DP\F236DP\F236DP-T8.raw  --f E:\project\12F236DP\F236DP\F236DP-T9.raw  --f E:\project\12F236DP\F236DP\F236DP-T10.raw  --f E:\project\12F236DP\F236DP\F236DP-T11.raw  --f E:\project\12F236DP\F236DP\F236DP-T12.raw  --f E:\project\12F236DP\F236DP\F236DP-T13.raw  --f E:\project\12F236DP\F236DP\F236DP-T14.raw  --f E:\project\12F236DP\F236DP\F236DP-T15.raw  --lib  --threads 30 --verbose 1 --out E:\project\12F236DP\F236DP_search2\2024_11_4_SP_Homo_sapiens_report.tsv --qvalue 0.01 --matrices --out-lib E:\project\12F236DP\F236DP_search2\SP_Homo_sapiens_report-lib.parquet --gen-spec-lib --predictor --fasta E:\project\12F236DP\F236DP\20220122SP_Homo_sapiens.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --cut K*,R* --missed-cleavages 2 --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --use-quant --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling --var-mod UniMod:2114,72.021129,K  --var-mods 4 
    Could you provide guidance on how to configure these options to achieve the correct results?
  3. Manual Deletion of Modified.Sequence Entries: If the GUI does not support this exclusion, would manually deleting Modified.Sequence entries with lactylation at the end help ensure accuracy in the pr.matrix.tsv file?

  4. Impact on pg.matrix.tsv: If we manually remove these sequences from pr.matrix.tsv, will this affect the pg.matrix.tsv file? Is it necessary to preserve the raw pg.matrix.tsv without any modifications?

Thank you very much for your guidance.

Best wishes, zplv

vdemichev commented 1 week ago

Hi zplv,

In the advanced options window of the GUI, is there a command to exclude sequences where lactylation appears only at the end?

No, the way to do this would be to edit the spectral library in R or Python to exclude any precursors.

What is the typical number of Modified.Sequence entries expected to contain lactylation?

Don't know, never worked with this PTM :)

The total unique Modified.Sequence count I obtained is 172,893, which seems unusually high.

Please share the DIA-NN logs, including for the library generation.

In general, please base your analysis exclusively on the main DIA-NN report in .parquet format, matrices are there just for quick analyses in Excel.

Best, Vadim

zplv686 commented 1 week ago

Hi Vadim,

Please share the DIA-NN logs, including for the library generation.

Thank you for your response! Attached is the log file. I really appreciate your suggestion, and I will try implementing it. Thanks again!

2024_11_4_SP_Homo_sapiens_report.log.txt

Best wishes, zplv

vdemichev commented 1 week ago

Hi zplv,

Please address both warnings in the log, it is essential.

Best, Vadim

ZHBHSMILE commented 1 week ago

Hi Vadim,

Thank you for pointing that out. I’ll address both warnings in the log as requested.

Best wishes, zplv