vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
242 stars 50 forks source link

filter out the proteins identified with only one peptide #78

Closed Clovernana closed 3 years ago

Clovernana commented 3 years ago

Hi, Vadim

I want to ask you for help. I want to filter out the proteins identified with only one peptide. How could I finish this in the "report.tsv" file? I really didn't figure it out. Maybe some codes in R could help? For example??

one

vdemichev commented 3 years ago

Hi Clover,

Yes, this is possible and quite easy. If "df" is the name of the data frame that contains the DIA-NN report in R, then:

data <- unique(df[,c('Genes','Stripped.Sequence')])
t <- table(data$Genes)

And "t" now contains the numbers of peptides matched to the protein (in this case I used the "Genes" column for protein Ids). Can do it for the whole report, or can do it also for each run separately.

Best wishes,

Vadim

Clovernana commented 3 years ago

OK, thank you for your reply. And I 'll strenghten my R basics. By the way, is filtering out the proteins with only one peptide a requisite for DIA proteomics? I have noticed that this statement is not clear in many articles.

vdemichev commented 3 years ago

No, in many cases it's perfectly fine to use proteins identified (and quantified) with a single peptide. But of course if you have a protein quantified with, say, 5 peptides, and all these show the same differential regulation pattern between conditions, this does give extra confidence.

Clovernana commented 3 years ago

OK,thank you! So is it controversial that I use those proteins with only one peptide when I perform a differential analysis? I'm not sure about this.

vdemichev commented 3 years ago

I don't see any problem with using a single peptide. It all depends on how you interpret the results. Basically, if you then report a list of proteins differentially regulated at 5% FDR (i.e. <5% of these are not really differentially regulated) - then it's fine, if at 0.1% FDR (i.e. you'd like to claim that only 1 out of 1000 proteins reported is not really differentially regulated) - not really.

Clovernana commented 3 years ago

OK. I get it.