Closed grst closed 7 months ago
@felixpetschko, since francesca mentioned you have been playing around with metrics as well, maybe this is something you would find interesting. The TCRdist metric is more relevant than levenshtein distance in practice. It's somewhat comparable to the alignment distance that Tobias has been working on, but might be even faster because it simplifies the alignment problem.
First step would be to get the code from nb_metrics.py
to work inside scirpy. As a next step it would be interesting if this works on GPU using numba's GPU features (or any other framework - as you like).
I'll provide feedback on the clonotype clustering later -- which is also still the priority right now. If we can't speed that up, speeding up the distance metrics is in vain.
Hi @grst Is anyone working on implementing this at the moment. If not, I would be interested in having a crack at it.
I think @felixpetschko has this on his list -- Felix, is that still the plan?
@grst @ShihanL Yes, I am working on this :)
Closed in #502.
This is in the main
branch and will be part of the next release.
@ShihanL, if you want to give it a try already, you can install the development version using
pip install git+https://github.com/scverse/scirpy.git@main
Another option towards #304.
TCRdist3, or rather its dependency pwseqdist (MIT licensed) comes with a numba-implementation of TCRdist3 distance metrics: https://github.com/agartland/pwseqdist/blob/master/pwseqdist/nb_metrics.py
It would be easy to convert this into a
DistanceCalculator
. Allegedly this is faster than using parasail, and it seems it's less complex than using a full sequence alignment, for instance, there are no affine gap penalties.Maybe this could even be ported to make use of numba's CUDA kernels to run on GPU.