scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
220 stars 34 forks source link

clone definition purely using CDR3 sequence #507

Closed zktuong closed 2 months ago

zktuong commented 7 months ago

Hi @grst,

I've been asked recently to create clonotype definitions purely using the identical/similar CDR3 sequence (and not considering any V/J gene information) and wanted to check with you whether this could be something useful to implement here as well? As far as i can tell, there's currently only same_v_gene but i guess we can add another same_j_gene?

grst commented 7 months ago

Hi @zktuong,

actually, the J gene is currently not considered at all. But actually, afaik @felixpetschko is working on having a same_j_gene feature as part of the optimized clonotype calling in #470.

grst commented 3 months ago

@zktuong, could d_call/c_call also be relevant?

Mostly asking because I'm debating the interface with @felixpetschko in https://github.com/scverse/scirpy/pull/470#issuecomment-2289156164

We could have either

same_v_gene: bool = True,
same_j_gene: bool = True

or something like

same_chain_attr: Sequence[str] = ["v_call", "j_call"]

The latter would be generic and work for any field that is chain-specific (rather than cell-specific), be it genes or something else. Not entirely sure though if there's even a use-case for that.

zktuong commented 3 months ago

d/c calls aren't relevant imo.

i think the first option is more intuitive