What Repertoire level Stats are required for Clones

bcorrie commented 3 years ago

We have a list of Stats that we have implemented for v1 of the Stats API for rearrangements.

Do we need all of these for Clones?

What else do we need for Clones?

bcorrie commented 3 years ago

Currently we have the following Repertoire level stats for Rearrangements:

/irplus/v1/stats/rearrangement/count /irplus/v1/stats/rearrangement/junction_length /irplus/v1/stats/rearrangement/gene_usage

Do we have these for Clones? Assuming yes...

bcorrie commented 3 years ago

What else do we want/need?

Assuming we want a Diversity statistic - see ireceptor-plus/specifications#77

What else?

systemimmunologylab commented 3 years ago

yes

also V gene usage

U

On Jan 5, 2021, at 10:18 PM, Brian Corrie notifications@github.com wrote:

Currently we have the following Repertoire level stats for Rearrangements:

/irplus/v1/stats/rearrangement/count /irplus/v1/stats/rearrangement/junction_length /irplus/v1/stats/rearrangement/gene_usage

Do we have these for Clones? Assuming yes...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/specifications/issues/78#issuecomment-754875365, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYBPMQ6RX5POJCNH5SHYLSYNX3FANCNFSM4VWHK2IQ.

bcorrie commented 3 years ago

yes also V gene usage U

Assuming for clones we have:

/irplus/v1/stats/clone/count /irplus/v1/stats/clone/junction_length /irplus/v1/stats/clone/gene_usage

gene_usage gives gene usage for V/D/J/C genes at the subgroup, gene, and allele levels.

If we agree we need diversity (see ireceptor-plus/specifications#77) then we also have:

/irplus/v1/stats/clone/diversity

systemimmunologylab commented 3 years ago

sorry no idea what i was thinking there how about functional/total clone ratio (I am assuming count is total functional)

bcorrie commented 3 years ago

Note for clarity and completeness, each API takes a JSON payload as parameters to specify both the set of Repertoires you want the stats for AND the specific Statistics you want...

Using the API:

/irplus/v1/stats/clone/gene_usage

With the JSON payload:

{
"repertoires":[{"repertoire_id":"REP1"}],
"statistics":["v_subgroup", "v_gene"]
}

Would get you the V subgroup and gene usage Stats for Repertoire REP1

bcorrie commented 3 years ago

sorry no idea what i was thinking there how about functional/total clone ratio (I am assuming count is total functional)

For /count we can define different count statistics. For rearrangements we defined four counts:

rearrangement_count - Number of rearrangements, independent of duplicate_count (each rearrangement can have a duplicate_count).
duplicate_count - Sum of the duplicate counts for each rearrangement
rearrangement_count_productive - same as above but with productive rearrangements only
duplicate_count_productive - same as above but with productive rearrangements only

So we can do the same for clones (since each clone has a clone_count)

clone_count - Number of rearrangements, independent of clone_count (each clone can have a clone_count).
clone_count_sum - Sum of the clone_count for each clone
clone_count_productive - same as above but with productive clones only
clone_count_sum_productive - same as above but with productive clones only

So one could ask for clone_count and clone_count_productive and then compute the ratio.

Given that, we probably don't need a separate clone_ratio???

bcorrie commented 2 years ago

I have added a /stats/clone/mutations entry point on the clone-stats branch in discussion with @ajrocha and @systemimmunologylab

This is a bit different than our other stats entry points as it takes a subject_id rather than a <repertoire_id, sample_processing_id, data_processing_id> triple as input. Thoughts?

bcorrie commented 2 years ago

Using the API:

/irplus/v1/stats/clone/mutations

With the JSON payload:

{
  "subjects": [
    { "subject": { "subject_id": "SUBJECT_1" }},
    { "subject": { "subject_id": "SUBJECT_2" }}
  ],
  "statistics": [ "total", "unique" ]
}

Will get you the mutation stats for SUBJECT_1 and SUBJECT_2. Exactly the format of the stats one gets is yet to be defined (to be provided by @systemimmunologylab)

And yes, the subjects object in the request is needlessly complex, but it currently mirrors the repertoires object in other stats and I didn't want to remove that structure until we agreed that we want these stats at the subject level (and not the repertoire level).

schristley commented 2 years ago

This is a bit different than our other stats entry points as it takes a subject_id rather than a <repertoire_id, sample_processing_id, data_processing_id> triple as input. Thoughts?

subject_id is not unique
being under the /stats/clone entry seems to imply its mutations for clones
It isn't clear what these are mutations of. The level of subject would suggest these are somatic mutations of the subject's genome, but that's completely different from AIRR-seq.

For contrast, here's the rough workflow when I use the immcantation suite for B cell somatic hypermutation analysis:

Starting with rearrangements, extract productive rearrangements and run clonal assignment.
For each clonally-assigned productive rearrangement, calculate replacement and synonymous mutations at each amino acid codon along the V gene (using IMGT's numbering, this is 104 AA positions)
At this point you can provide a count of how many mutations along the V gene for each rearrangement.
As a clone might be composed of multiple rearrangements, you can total up the mutations at each position for each rearrangement to get counts for the whole clone.
If you want, you could total up the mutations for all clones within a repertoire, giving you repertoire level mutation counts.

It's also possibly that you might want mutation counts for unproductive rearrangements as these might be considered mutations under the null model, and unaffected by the selection process of affinity maturation.

bcorrie commented 2 years ago

@schristley thanks, that is helpful, the subject_id is being driven by @systemimmunologylab use case, so I am not sure what the expected output would be. I "think" it might be similar to your last bullet point, total mutations for all clones in a subject. Just not sure how that is represented...

I suppose we would need to determine how to resolve that in the ADC context. Maybe it boils down to getting all of the repertoires from a subject in a study, and then have the Stats API return repertoire level stats (as they normally do) and then if you want subject level mutation counts then you sum up across all the repertoires for that subject... Not sure if that makes sense or not.

@systemimmunologylab we need some feedback here...

schristley commented 2 years ago

In the recent discussion, there was a mention of conservative vs non-conservative amino acids changes, though it wasn't clear how those were defined. Shazam allows mutations to be defined based upon amino acid properties, does one or more of these properties match the conservative/non-conservative definition?

And if so, is there a particular reason why we would define/provide just one, why not all? Also, why not mutations that change the amino acid, regardless of the amino acid properties?

schristley commented 2 years ago

yellow hydrophobic / burried, red hydrophilic/ surface and blank neutral / intermediate

substitution within category is conservative. between is non conservative.

this is the first paper using these definition. Hershberg, U. and Shlomchik, M. J. (2006) Differences in potential for amino acid change following mutation reveals distinct strategies for kappa and lambda light chain variation. PNAS Vol.103 No.43 pp. 15963-8. PDF

the chothia ref is number 20 in this paper

bcorrie commented 1 year ago

Did not implement Stats for Clones as part of the project, closing this issue.

ireceptor-plus / issues

What Repertoire level Stats are required for Clones #16