%D uptake vs #D uptake - Githubissues

nlgittens commented 2 years ago

I had an open question regarding respective merits of use of %D uptake and #D uptake for the statistical modelling. I know that this is something that I have had to consider in my own work: analyses like PCA are going to be unfairly weighted by longer peptides in any PCA model if using the #D uptake. On the other hand, normalising to a %D uptake meant that each peptide is treated equally, where the delta(#D) magnitude may still be small for a peptide, but still significant.

So when it comes to the #D heat map plots, any delta(#D) for small peptides is going to be obscured even if the difference is very significant. This is simply a cosmetic thing and of course there are other ways of showing the data, but it did make me wonder if this has any impact on the underlying model.

ococrook commented 2 years ago

Good question! The modeling should be done on the #D scale, because %D puts the data on a different scale a distors the correlations. We can always rescale for visualisation on heatmaps etc for interpretation very happy with that.

The other issue is slightly subtle, when we rescale by length of peptide we are assuming that length doesnt change any other properties of the measurement process. For example is backexchange stronger in a length 20 peptide vs a length 10 peptide. Is the MS less sensative for longer peptides. So scaling by length of peptide could introduce some uncertainty which wasnt accountes for. Doing the modelling on the #D scale avoids the issues because we compare like for like. If we want to compare 1 peptide for another in the same experiment, we should be somewhat careful. For example, I'd be careful with PCA because there is an underlying gaussian assumption in the data and the transforms that get us here coud be mis-leading. Maybe if you let me know what you want to do with PCA I can help there.

We could add a helper function to scale to %D for heatmaps - would that help? Just checking that %D = #D/length(peptide).

nlgittens commented 2 years ago

%D = #D/number of exchangeable residues, yes; which is (N - (no. proline residues) - 2). Or with back-exchange correction: 100*#D(t =x)/#D(t=∞).

I understand what you are saying wrt to length-dependent effects. Most experiments still don't include a control for back-exchange correction yet.

On PCA, we should probably pick up off-line. Speaking in general terms though, if we consider at a time t, state A has #D = 1, and state B has #D = 1.5. Does the significance of this result not change depending on whether the peptide has only 2 exchangeable residues, versus 10 exchangeable residues, or 20? Perhaps not, since only the parameter a is affected in the Weibull model.

ococrook commented 2 years ago

@nlgittens There are now some normalisation functions and some code in the package + examples in the vignette

ococrook / hdxstats

%D uptake vs #D uptake #4