pwilmart / TMT_analysis_examples

Examples of TMT data analyses using R. Links to notebooks and repositories. Also a few spectral counting analyses.
MIT License
21 stars 4 forks source link

IRS on non-normalized summed signal / noise data #3

Open adomingues opened 3 years ago

adomingues commented 3 years ago

Hi @pwilmart,

This is not an issue but rather a question about normalization strategies. I am reading your IRS notebook (very thorough work and easy to read!), and noticed that:

Peak height intensities (not the PD default signal/noise ratios) were used for the reporter ions. Note that a signal-to-noise ratio is a compressed unitless number and is not a valid quantitative measurement.

Well, I have some data, processed with Isobarquant, which consists of the Non-normalized summed signal / noise. If I am reading the quote correctly this means that these values cannot be used for IRS? Or can I still use them for IRS?

The table also contains the above intensities but scaled for each protein which I guess I could use straight up for limma.

Thanks in advance for your thoughts.

pwilmart commented 3 years ago

Hi, The promotion of S/N ratios for TMT data from the Gygi lab and its subsequent incorporation as the default in Protoeme Discovery is something I have unsuccessfully argued against for years. If you have a signal plus noise value and a separate noise value, you SUBTRACT the noise from the combined value to get a more accurate estimate of the true signal value. That is basic measurement science. A signal to noise ratio is a quality (not quantity) measure used for signal filtering.

The data that is saved from Thermo Orbitrap instruments already has the noise level in the FT transients subtracted. Peak heights are a proxy for peak areas and are okay relative signal measures provided all peaks have the same shape. Given the reporter ion m/z values are very similar, the assumption of similar peak shapes seems okay.

The place where S/N has the most issues is relative abundance within a sample. My PI studies human eye lens proteins. There are a handful of special crystallin proteins that make up nearly all of the lens proteins (greater than 90% of the weight weight of lens). If you process human lens with MS, you can see what fraction the crystallins are of the total proteins identified by different abundance estimates (spectral counts, S/N ratios, and TMT peak heights). Spectral counts give about 30%, S/N is about 60%, and peak heights are more than 90%. Only the peak heights give the right answer.

We usually compare the values for each protein across samples/conditions rather than to other proteins within the same sample. The S/N values are kind of the same scale transformation across channels, so they do not affect statistical testing too much. The fact that S/N doesn't screw up comparing the same protein between conditions is probably why I can't convince anyone to stop using it. My background was in physics and when you make measurements, you do not measure the wrong thing.

Long story short, IRS is a per protein adjustment, so I think it will be okay with S/N values. It just puts all of the values within each plex on the same scale experiment-wide. Cheers, Phil

adomingues commented 3 years ago

Thanks a lot for the detailed explanation Phil. This is super helpful for me because I am come from the sequencing side of things (NGS) and only dabbled briefly with Mass-Spec data, mostly LFQ, so having some background info on TMT is welcome - I am still reading and learning.

Best, António