vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
239 stars 50 forks source link

What's the difference between Protein.Group and Protein.Ids? #883

Open humility3239 opened 7 months ago

humility3239 commented 7 months ago

My raw files were searched with DIA-NN 1.8.1 and the output table "report.pg_matrix.tsv" was used for further analysis. I was confused with the two columns "Protein.Group" and "Protein.Ids". The column "Protein.Group" usually contained only one reviewed protein entry, but the column "Protein.Ids" contained several reviewed protein entries. If the column "Protein.Ids" was used, it meant shared peptides were assigned to different protein. So the final quantification result may be a little incorrect. Is it right?

image

vdemichev commented 7 months ago

Protein.Group contains proteins inferred using the maximum parsimony principle, see e.g. the IDPicker paper. The Protein.Ids column does not make any sense in the protein group quantitative matrix actually (quantities correspond to entries in Protein.Group), I will remove it in the next versions :) Thank you for pointing this out!

Best, Vadim

Benjo23 commented 5 months ago

Just noticed this section. I also just noticed the issue. Thanks for sharing. However. I also noticed that the Protein group also can have multiple accessions and they seemed to be in alphabetical order. In this case, which one of them would be the most likely one?