scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
218 stars 34 forks source link

BCR-tutorial #199

Closed grst closed 1 day ago

grst commented 4 years ago

Add a tutorial for BCR (or joint BCR/TCR analysis).

Either as additional tutorial, or replace the TCR tutorial as the main "getting started" tutorial.

The latter may make particularly sense, as we could show a joint BCR/TCR analysis.

ktpolanski commented 3 years ago

Another signal boost post. I just tried to analyse some BCR, so I mimicked the TCR tutorial.

grst commented 3 years ago

Hi @ktpolanski,

the main reason why I didn't prioritize the tutorial so far is that the analysis steps don't differ a lot between TCR and BCR... So following the TCR tutorial should be fine. Would still be nice to have it covered, of course.

Do you have something particular in mind with respect to BCRs?

Best, Gregor

ktpolanski commented 3 years ago

Having had a chat with a biologist, I believe Scirpy's existing functionality covers BCR-specific phenomena (non-identity CDR3 comparisons for somatic hypermutation, and .obs['..._c_gene'] for class-switch recombination). As such, it should actually be fine, yeah!

Given how vital a role the neighbour/clonotype calling seems to have in the process, do you have any suggestions for Levenshtein parameterisation for BCRs?

grst commented 3 years ago

We don't have any evidence, what the optional threshold is. For TCR, I would say that anything with a levenshtein distance of >1-2 is unlikely to recognize the same antigen. @szabogtamas, our immunology expert, suggested that for BCR it may make sense to relax this threshold a bit.

In the introduction of this preprint, they write:

The authors in [14], have noticed that the distribution of distances between sequences and their nearest neighbors (distance-to-nearest) tends to be bi-modal, with a first mode corresponding to clonally related sequences and second mode corresponding to sequences without clonal relationship (singletons). Using this bi-modality, [14] proposes to set a threshold that separates the two modes. Following this observation, [1, 17] use the bi-modality of this distribution to suggest an automatic way to set the threshold. A recent method by [2] uses spectral clustering with an adaptive threshold to identify the groups of clonally related sequences.

I gave that a quick shot with the Maynard 2020 dataset, and coudn't find a bimodal distribution. However, the number of B-cells in that dataset is rather small (~1,700), so it might very well be that this pattern only starts to emerge with 10,000s or even 100,000s of cells.

In any case, this is a topic that interests me myself. If you find something useful, let me know and I'm happy to add it to the documentation.