vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Proteotypic peptides with modifications are being assigned to different protein groups, with different quantitiation #1191

Closed makbor00 closed 1 month ago

makbor00 commented 1 month ago

Hi,

I ran a DIANN search on samples that have been enriched for K-GG modification. I included C carbamidomethylation, Ox(M), and K-GG as modifications, and 2 max variable modifications.

While analysing the report.pr_matrix.tsv (as I am interested in the precursor-level analysis) I found that after filtering for proteotypic precursors, there were some precursors that were shared between different protein groups, and assigned different quantitation. I'm struggling to understand how the same precursor (with the same modifications) can be assigned to two different protein groups with different quantitation. This affects around 5% of the modified peptides in the dataset (after filtering), which could make it difficult for biological interpretation. I attach a section of the report that exemplifies this.

duplicated_modified_peptide.xlsx

Any help with this would be appreciated. Thanks.

vdemichev commented 1 month ago

Hi,

How does the log look like?

I'm struggling to understand how the same precursor (with the same modifications) can be assigned to two different protein groups with different quantitation

In general, precursor assigned to a group means the group is quantified using this precursor. There's no requirement per se that different precursors matching the same sequence cannot be used to quantify different groups. Further, some precursors, depending on modifications, can originate only from specific positions in a protein. For example, if a peptide ends with K-GG in a tryptic digest, it must be at the end of the protein, as trypsin does not cut after KGG. Therefore, if the sequence matches to proteins and is C-terminal only for one, then only that protein can be matched to its KGG variant.

Best, Vadim

makbor00 commented 1 month ago

report.log.txt

vdemichev commented 1 month ago

In the .xlsx file you attached the Protein.Group is the same for peptides with the same sequence (two distinct sequneces there), i.e. not sure what you mean?

makbor00 commented 1 month ago

I've attached a slightly different representation of what I mean.

The modified sequence is the same, including the 2 different charged states. However, the peptide has been assigned to the two different protein groups, with difffering quantitation. How can this be the case?

duplicated_modified_peptide2.xlsx

vdemichev commented 1 month ago

Here it’s also two distinct (although similar) amino acid sequences

makbor00 commented 1 month ago

Oh yes I see now. I think I figured out where in my analysis the duplication has come from. Thank you for your help.