analyzing DIA data - Githubissues

fazeliniah commented 2 years ago

Dear amica developer team, Thank you for developing this really useful tool. The design and implementation of the shiny app is nicely done. Have you tested amica for DIA data? In particular, I am interested in analyzing DIA output from Spectronaut. Thanks

tbaccata commented 2 years ago

Hello! Thank you very much for the kind feedback!

With the Upload custom format option, amica can read in any custom tab-separated format. There are unfortunately not many publicly available spectronaut Protein Group (PG) result tables, working with the description in Spectronauts user manual (http://files.biognosys.ch/058_Spectronaut/ReleaseMaterial/00_Manual/Spectronaut15_UserManual.pdf) and the one Spectronaut PG output file I've ever got in contact with, the output should look like this:

PG.Cscore	PG.ProteinGroups	PG.Genes	PG.Organisms	PG.ProteinDescriptions	PG.ProteinNames	[1] sample_1.htrms.PG.NrOfStrippedSequencesIdentified	[n] sample_n.htrms.PG.NrOfStrippedSequencesIdentified	[1] sample_1.htrms.PG.Quantity	[n] sample_n.htrms.PG.Quantity

My table had these [1] - prefixes, which need to be removed for amica to properly read that file in:

PG.Cscore	PG.ProteinGroups	PG.Genes	PG.Organisms	PG.ProteinDescriptions	PG.ProteinNames	sample_1.htrms.PG.NrOfStrippedSequencesIdentified	sample_n.htrms.PG.NrOfStrippedSequencesIdentified	sample_1.htrms.PG.Quantity	sample_n.htrms.PG.Quantity

Some values in the htrms.PG.NrOfStrippedSequencesIdentified - and htrms.PG.Quantity columns are not numeric, however amica expects numeric values in these columns. So we have to change that. We need to replaced all Filtered values in these columns with NA or 0.

My Spectronaut file did not have a PG.IsSingleHit column, you might want to remove all single hits from the preprocessed file if you have such a column and want to remove them. As the PG file does neither contain a summarized number of identified peptides, nor a summarized number of MS/MS counts column, amica won't perform any filtering on these count values.

Once we have preprocessed spectronauts PG output, we need to specify relevant DB search tool specific columns in a tab-separated specification file:

Variable	Pattern
proteinId	PG.ProteinGroups
geneName	PG.Genes
razorUniqueCountPrefix	.htrms.PG.NrOfStrippedSequencesIdentified
intensityPrefix	.htrms.PG.Quantity

Finally, we can select the Upload custom format option in the Input tab and upload 1) the tab-separated preprocessed spectronaut PG result table 2) a tab-separated experimental design that maps the samples to biological groups 3) a tab-separated specification file that maps Spectronaut's format to amica's internal format 4) a tab-separated contrast matrix that tells amica which group comparisons to perform.

I hope this holds true for all (or at least the majority of) Spectronaut PG output files, unfortunately I don't know this. I hope this helps you, please let me know if your output file is in a different format.

Best, Sebastian

fazeliniah commented 2 years ago

Thank you for your explanation!

tbaccata / amica

analyzing DIA data #3