matsengrp / sumrep

Summary statistics for repertoires
16 stars 6 forks source link

Physicochemical features of CDRs #5

Closed BrandenOlson closed 5 years ago

BrandenOlson commented 7 years ago

There are many ways to categorize the biochemical properties of amino acids. We have mostly discussed GRAVY numbers, Atchley factors, and Kidera factors (for TCRs), but have also mentioned charge and hydrophobicity in general. The first three are readily implemented in R via alakazam::gravy, Peptides::kideraFactors, and HDMD::AAMetric.Atchley, so that separate comparison functions wouldn't be difficult, but it is still undecided whether we should generalize beyond these three, and how to do it intelligently. @javh mentioned that the seqinr package includes the list of AA indices - would it be worthwhile to incorporate this, if possible?

BrandenOlson commented 7 years ago

Also, looking at Peptides::kideraFactors, I'm realizing that there are 10 factors being output. Which one(s) we want to look at for CDR3s? I'm guessing KF4 (hydrophobicity) at the very least.

The full list can be found in the Peptides documentation.

javh commented 7 years ago

I think the properties used in these papers are a decent place to start:

  1. Wu, Y.-C. B. et al. High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood 116, 1070–8 (2010).
  2. Wu, Y.-C. B., Kipling, D. & Dunn-Walters, D. K. The Relationship between CD27 Negative and Positive B Cell Populations in Human Peripheral Blood. Front. Immunol. 2, 1–12 (2011).

Their results are highly reproducible. At least, for the features I've looked at.

So, hydrophobicity (Kyte & Doolittle Scale), aliphatic index, and either isoelectric point or net charge at ph 7.4 at a minimum. Obviously, we can add many more...

BrandenOlson commented 5 years ago

As of now, sumrep incorporates each property given by https://rdrr.io/cran/alakazam/man/aminoAcidProperties.html.

sumrep also contains getter functions for Peptides::kideraFactors and HDMD::AAMetric.Atchley, but I still need to implement their comparison functions. @matsen and I think the most sensible idea will be to compute component-wise divergences over each factor and output them as vectors.

BrandenOlson commented 5 years ago

Okay, these comparisons are officially implemented, and included in compareRepertoires.