tbaccata / amica

amica: an interactive and user-friendly web-based platform for the analysis of proteomics data
GNU General Public License v3.0
23 stars 8 forks source link

analyzing DIA data #3

Closed fazeliniah closed 2 years ago

fazeliniah commented 2 years ago

Dear amica developer team, Thank you for developing this really useful tool. The design and implementation of the shiny app is nicely done. Have you tested amica for DIA data? In particular, I am interested in analyzing DIA output from Spectronaut. Thanks

tbaccata commented 2 years ago

Hello! Thank you very much for the kind feedback!

With the Upload custom format option, amica can read in any custom tab-separated format. There are unfortunately not many publicly available spectronaut Protein Group (PG) result tables, working with the description in Spectronauts user manual (http://files.biognosys.ch/058_Spectronaut/ReleaseMaterial/00_Manual/Spectronaut15_UserManual.pdf) and the one Spectronaut PG output file I've ever got in contact with, the output should look like this:

PG.Cscore PG.ProteinGroups PG.Genes PG.Organisms PG.ProteinDescriptions PG.ProteinNames [1] sample_1.htrms.PG.NrOfStrippedSequencesIdentified [n] sample_n.htrms.PG.NrOfStrippedSequencesIdentified [1] sample_1.htrms.PG.Quantity [n] sample_n.htrms.PG.Quantity

My table had these [1] - prefixes, which need to be removed for amica to properly read that file in:

PG.Cscore PG.ProteinGroups PG.Genes PG.Organisms PG.ProteinDescriptions PG.ProteinNames sample_1.htrms.PG.NrOfStrippedSequencesIdentified sample_n.htrms.PG.NrOfStrippedSequencesIdentified sample_1.htrms.PG.Quantity sample_n.htrms.PG.Quantity

Some values in the htrms.PG.NrOfStrippedSequencesIdentified - and htrms.PG.Quantity columns are not numeric, however amica expects numeric values in these columns. So we have to change that. We need to replaced all Filtered values in these columns with NA or 0.

My Spectronaut file did not have a PG.IsSingleHit column, you might want to remove all single hits from the preprocessed file if you have such a column and want to remove them. As the PG file does neither contain a summarized number of identified peptides, nor a summarized number of MS/MS counts column, amica won't perform any filtering on these count values.

Once we have preprocessed spectronauts PG output, we need to specify relevant DB search tool specific columns in a tab-separated specification file:

Variable Pattern
proteinId PG.ProteinGroups
geneName PG.Genes
razorUniqueCountPrefix .htrms.PG.NrOfStrippedSequencesIdentified
intensityPrefix .htrms.PG.Quantity

Finally, we can select the Upload custom format option in the Input tab and upload 1) the tab-separated preprocessed spectronaut PG result table 2) a tab-separated experimental design that maps the samples to biological groups 3) a tab-separated specification file that maps Spectronaut's format to amica's internal format 4) a tab-separated contrast matrix that tells amica which group comparisons to perform.

I hope this holds true for all (or at least the majority of) Spectronaut PG output files, unfortunately I don't know this. I hope this helps you, please let me know if your output file is in a different format.

Best, Sebastian

fazeliniah commented 2 years ago

Thank you for your explanation!