vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
266 stars 53 forks source link

report.tsv file does not contain Protein.Names and Genes column data #909

Open AkilaWijerathnaYapa opened 8 months ago

AkilaWijerathnaYapa commented 8 months ago

I am performing DIA-NN from FASTA digest library for Arabidopsis thaliana. I got FASTA file from plants.ensembl

However after smooth DIA-NN run, in final report.tsv file all corresponding Proteins.Ids, both Protein.Names and Genes column data shows as pep.

What might be the issue? Is this because of FASTA file data? or Can I trust the report.tsv file data?

image

vdemichev commented 8 months ago

Protein names & genes are read correctly from UniProt fastas, from all other fastas - not guaranteed. But solution is simple (like 20 min of work maybe): load the FASTA in R using some R package and use Protein.Group entries to generate correct Protein.Names and Genes. When having this issue, please also set DIA-NN's implicit protein grouping to Isoforms.

AkilaWijerathnaYapa commented 8 months ago

Thank you Vadim. Could you please let me know what do you mean by "set DIA-NN's implicit protein grouping to Isoforms"? Where to find this setting in DIA-NN GUI? Do I have to re-run the DIA-NN analysis? Please share if there's any tutorial is available.

vdemichev commented 8 months ago

'Protein inference' setting

vdemichev commented 8 months ago

You can rerun the analysis with 'Use .existing quant files' enabled

AkilaWijerathnaYapa commented 8 months ago

Thank you Vadim.