vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
279 stars 54 forks source link

How to know if proteins are identified with one more unique peptides #857

Closed singhsk2622 closed 6 months ago

singhsk2622 commented 1 year ago

Hi We have just started using DIA NN, and we have very common question asked by users. How to know the number of peptides in each samples? how to find if proteins are identified with one more than one peptides ? Does any one has any approach to look for this question ? Thanks for help. best, sachin

vdemichev commented 11 months ago

Hi sachin,

This is achievable quite easily in R or Python using DIA-NN's main report. Also, the precursor numbers per sample are printed in the stats.tsv file.

Best, Vadim

singhsk2622 commented 6 months ago

Hi Vadim, Following same question asked by our proteomics core facility user: They performing IP of negative control and Antibodies based IP so they want to know peptide identification in negative control. Second question is how to see non normalized intensity? So we have veryl limited skill in R so we wondering if you have any pipe line to extract information from the big file which is impossible to open in R.

kwebber3 commented 6 months ago

@vdemichev I have the same question. There is no Boolean column in report.tsv to know if a precursor is unique (i.e. not a tryptic peptide for any other proteins). Additionally, the question is about unique precursors per protein, not unique precursors per sample.

weiclav commented 6 months ago

Hi there,

there is "Proteotypic" column where you have this boolean info you want. Proteotypic = unique in a range of the whole protein database, not only in a range of reported proteins (ie., unique). There is no info about the precursors uniqueness as such if I am right. Btw., this column is precursor level info, so if you sum this info up while pivotting the table and going to the protein level, you get the number of proteotypic precursors for the given protein group.

Btw., pivotting, the data manipulation you probably want to do to make the table look like you are more used to, can be done also in Excel, yet I have not tried this approach yet. One just needs to keep in mind what values are being aggregated from multiple rows into a single one to select the correct aggregation function (sum, min, first, etc.). Some values that are protein group specific are just mentioned in all rows of the main report, like PG.Quantity co here you have to use e.g. max value reported. This is something one needs to consider for each column to be returned into the pivot table. You can go in two steps as well: pivot the main report into the precursors table first and from this stage, group the rows based on the protein.group column to get to the protein group level. This way you gen both, precursor and protein level tables. You need to consider aggregation function in both steps as evident.

You can check KNIME for the data analysis, it is relatively straightforward for similar data manipulation steps if you would like to leave Excel in the end - you would run into problems in Excel sooner or later anyway (data size limitation etc.), so ideally start some scripting, it is not that hard in the end as it may seem at the very beginning...

David