openforcefield / protein-ligand-benchmark-livecoms

18 stars 3 forks source link

Deamidation can cause ASN -> ASP changes in benchmark proteins #6

Open jchodera opened 3 years ago

jchodera commented 3 years ago

I recently learned about a phenomenon in which asparagine residues can undergo spontaneous intermolecular deamidation reactions to interconvert into a mixture of asparagine, aspartic acid, and isoaspartic acid: image This image is take from Protein asparagine deamidation prediction based on structures with machine learning methods, which also provides an ML model for predicting sites of deamidation.

The reaction is intramolecular, so mostly influenced by the influence of neighboring residues on (phi,psi) geometry preferences: image

Of particular risk is Asn-Gly motifs (NG), which are at high risk for deamidation, and often produce ~1:1 mixtures of both Asn and Glu/iso-Glu.

Any deamidation will obviously cause computed affinities for the assumed pure protein to deviate from experimental measurement---likely significantly if these residues are near the binding site. X-ray electron densities might show a mixture, but could also select for a purer form to form the crystal than exists in a typical assay mixture.

As deamidation can be confirmed (or refuted) by mass spectrometry techniques, it seems reasonable to recommend that proteins predicted to be at particular risk for amidation (and containing NG motifs in particular) be verified to be free of deamidation by mass spec techniques if they are to be included in the benchmark set.

Also of concern is Glu <-> iso-Glu interconversion. From the same paper: image

ppxasjsm commented 3 years ago

Is there anything said about the prevalence of how likely this is to occur?

jchodera commented 3 years ago

Great question!

This survey of the PDB from 2001 attempts a global analysis, and suggests that 0.19% of proteins have Asn residues that are at least 1/2 deamidated <1 day in Tris buffer, or 5.6% <10 days, 41% <100 days: image There are also statistics for phosphate buffer (though they list >1/10 amidated).

While Asn-Gly is the worst offender, there are others that can be problematic too: image

jchodera commented 3 years ago

To first approximation, it might be useful to assess the test systems for the presence of Asn-Gly motifs. A better approximation can look at all Asn-X motifs and assess whether the folded geometry predisposes that Asn to deamidation. The experimental buffer and time the protein has sat in buffer at pH 7.4 at RT or above is also relevant.

jchodera commented 3 years ago

Asp isomerization is also a relevant phenomenon.

The difficulty for protein-ligand benchmarks is that the experimental conditions for structure determination may diverge from those used for the affinity assay, meaning that we may not see these artifacts in the experimental structure (or, if we do, they may not be relevant for the experimental assay).

The safest thing to do might be to use mass spec techniques to confirm no Asn -> Asp/iso-Asp deamidation has occurred if there are at-risk Asn residues.

ppxasjsm commented 3 years ago

I am happy to add something to the manuscript. Just wondering what the best practices/recommendation should be?

I assume we should check for this in the current dataset? Has the manuscript been submitted to Live Comms yet @dfhahn ?

dfhahn commented 3 years ago

@ppxasjsm No, it has not been submitted yet. Any additions are welcome!

Another point to consider:

jchodera commented 3 years ago

@ppxasjsm @dfhahn : This sounds reasonable! Perhaps we can add a paragraph like this, and a single line ("Assess potential for protein deamidation") to the checklist?

Deamidation: Asparagine (Asn) residues can undergo spontaneous deamidation to produce a mixture of asparagine (Asn), aspartate (Asp), and iso-aspartate (iso-Asp) in a manner that is dependent on pH and buffer conditions 1. Particularly susceptible are Asn-Gly (NG) motifs, where the trailing glycine residue enables sufficient flexibility to facilitate reaction rates that may pose issues on experimentally-relevant timescales 2. Glutamine residues can also deamidate, though at a slower rate 2. Fortunately, machine learning methods are available to predict deamidation rates 3, and mass spectrometry methods can be used to assess whether deamidated protein is present under the assay conditions 4. If these methods cannot be used to confirm evidence of deamidation, we recommend at least inspecting protein sequences for the most problematic NG motifs 2, and consider excluding NG-containing proteins from inclusion in gold-standard benchmarks if purity under assay conditions cannot be confirmed by mass spectrometry.