smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Protein PTM output #253

Closed zrolfs closed 7 years ago

zrolfs commented 7 years ago

I wrote some code yesterday to output the number of sites on each protein where a given PTM occurs. I think adding that output to MetaMorpheus would be a fast, easy way for users to visualize the PTM sites observed for each protein. I'll clean up the code and do a pull request sometime, if you think it would be worthwhile?

stefanks commented 7 years ago

Zach, please coordinate this with Rob, since this feature would be closely intertwined with his parsimony code. It sounds useful, but I need more detail to understand the visualization part better!

acesnik commented 7 years ago

It could also be useful to note where coverage wasn't observed, and thus where PTM sites could not be observed -- clearly annotating missing information.

rmillikin commented 7 years ago

How about something like this?

mgkgtpsfgkrhnkshtlcnrcgrrSFHVQKKTCSSCGYPAAKtrsynwgakakrrHTTGTGRmrylkhvsrrFKN[Deamidation of N]GFQTGSASKasa

lowercase = not observed residue uppercase = observed residue brackets = mod at that location, same notation as PSM output

zrolfs commented 7 years ago

I like Rob's suggestion, but that can still be cumbersome for a user to wade through to find differences in observed PTMs between different samples/conditions. I think we would still benefit by implementing the PTMs with the observed residues to provide an overview of what PTMs were found, but perhaps also have an additional column with an output like:

aa76v:[Deamidation of N] | aa101:[Acetylation] | etc.

Where: "aa#" is position of PTM "v:" signifies that the given residue was observed both with and without that PTM "[*]" is the mod at that location, same notation as PSM output

acesnik commented 7 years ago

You could put an estimated occupancy ratio (e.g. by PSM count) inside the brackets. xxxxxxxXXXX[mod1|info:occupancy=1]XXXX[mod2|info:occupancy=0.25][mod3|info:occupancy=0.5]xxXXXXXXXXX

In this example, Mod1 is observed in all PSMs Mod2 and mod3 are observed at the same site, which is unmodified 25% (1-0.25-0.5) of the time Mod2 is observed 25% of the time Mod3 is observed 50% of the time

zrolfs commented 7 years ago

I like the occupancy idea. It would still be nice to have a list of modifications with their indexes, though, so that you can easily compare what PTMs were observed where without having to scan through each protein.

I was thinking about occupancy last night, and if we're estimating occupancy by PSMs, then might it be useful to include the number of modified PSMs observed over the total? Rather than put "occupancy=0.5", have something like "occupancy=0.5 (2/4)", where two modified and two unmodified PSMs were detected. I'm worried that it might start to look cluttered, but I think that's valuable information for determining the accuracy of that ratio. For example, I would have greater confidence that all of the proteins possessed a given modification if it read "occupancy=1 (6/6)" rather than "occupancy=1 (1/1)". Likewise, the uncertainty in the occupancy for "occupancy=0.5 (1/2)" is much greater than for "occupancy=0.5 (5/10)". This would also provide a rough quantification when users are scanning the output.

acesnik commented 7 years ago

Here's the CTDP document on proteoform nomenclature. You could use this format for annotating the PTM information. https://docs.google.com/document/d/1SpAQR8aPc2cCXXSjUobg_VXC85br1WukBKNZh5TPaQ8/edit#

lonelu commented 7 years ago

What does Occupancy "occupancy=1 (6/6)" mean? If there is an example show me the idea of how to calculate, that will be great. Now I can print something like this:... #aa324[Calcium on D|info:occupancy=]#aa...

rmillikin commented 7 years ago

it would be that out of 6 PSMs detected for the peptide base sequence, all 6 have that modification. so for example you observe these PSMs:

PEPTIDE[mod1] PEPTIDE[mod1] PEP[mod2]TIDE[mod1] PEPTIDE[mod1]

you could say [mod1:occupancy=1 (4/4)] [mod2:occupancy=0.25(1/4)]

stefanks commented 7 years ago

Done in https://github.com/smith-chem-wisc/MetaMorpheus/pull/359