smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Calibration fails for phospho-enriched TMT run #1875

Open acesnik opened 4 years ago

acesnik commented 4 years ago

I'm trying to run calibration on a published phospho-enriched TMT dataset, and all but two of the files fail to calibrate. Has anyone else tried to run calibration on such a dataset or know why it's failing?

acesnik commented 4 years ago

Posting them to \bison\share\Projects\MetaMorpheusPhosphoTmtU2OS

acesnik commented 4 years ago

The search works great without the calibration, so no rush on this one. It's something to check out down the road.

zrolfs commented 4 years ago

Can you include a calibration toml and database? I need more info.

acesnik commented 4 years ago

I'm adding the original database and the partial results to the folder. (I didn't let the run finish, so I could start running other things.) Thanks for looking into this!

It should be done uploading in ~9 mins.

acesnik commented 4 years ago

I ran G-PTM-D before the calibration in this run because I knew calibration failed in the past... I was wondering if it was because there weren't enough phosphos annotated in the database for the phospho-enriched run. It turns out a similar number of files failed with or without GPTMD.

The GPTMD results are included in the folder.

zrolfs commented 4 years ago

It looks like the TMT mods don't have a chemical formula added. Calibration uses this information to calculate the accurate mass and discards the identification if it's not present. For this dataset, that means it has to have an N-terminal acetyl and no lysines, which is pretty rare. I'll try to add the chemical formulas for the TMT mods.

trishorts commented 4 years ago

Maybe you can get something close. But, unfortunately, neither TMT nor any of the other so-called isobaric tags are truly isobaric at high resolution. They are isobaric enough to be selected together for fragmentation. But there still equal. We don't want them to be relatively miscalibrated becasue the spacing between the various diagnostic ions is the key for quant.

acesnik commented 4 years ago

Thanks for finding that out, Zach, and good point, Michael.

I suppose we could calculate the chemical formula for each of the multiplex-labeled peptides, and then they would get aggregated by IsotopicDistribution based on the resolution. But we'd want to report the MS1 identification as one peptide, not an ambiguous ID between the multiplex-labeled peptides.

zrolfs commented 4 years ago

I talked with Rob about this and he suggested we supplant the chemical formula with an averagine model for TMT (and other cases). I'm trying this right now to see if the calibration improves your data or if it makes it worse.

I'm pretty sure the TMT labels are the exact same mass. There is no neucode difference. image

trishorts commented 4 years ago

you could be right. i might have mixed things up. best to double check for higher multplexing though. 11plex

zrolfs commented 4 years ago

Still working their magic with just 4 carbons and 1 nitrogen for TMT11 image

zrolfs commented 4 years ago

Hi Anthony, Because the intact mass for all of these labels is identical, we can just add the chemical formula to tmt.txt (located in MetaMorpheus\EngineLayer\Mods\tmt.txt

I've attached an updated version here that you can use as a short term solution. I'll also open a PR to update the file.

This is still going to be an issue for iTRAQ, since those masses do have the neutron-binding energy mass differences. We could use averagine to model the theoretical isotopic distribution, but we'll still be stuck with the issue of ensuring that all of the observed MS1 peaks are being assigned to the correct theoretical MS1 peaks. This is a big undertaking, but will be important for the calibration of anything that uses these small mass differences (iTRAQ, DiLeu, neucode SILAC)

tmt.txt

zrolfs commented 4 years ago

PrecursorMassTolerance = "±34.6907 PPM" ProductMassTolerance = "±5.1149 PPM"

Oof. Okay, that's not great. I'll look again.

zrolfs commented 4 years ago

This is bizzare. I was expecting it to shift the distribution 30 ppm, but it smeared it image

acesnik commented 4 years ago

Thanks for sharing the plots. That is bizarre!