scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
219 stars 34 forks source link

ir.pp.ir_dist() performance on interactive versus batch jobs #555

Closed olivermccallion closed 4 days ago

olivermccallion commented 1 month ago

Describe the bug I'm trying to calculate the IR distance for 100k TCR sequences:

        ir.pp.ir_dist(
        sample,
        metric = "alignment",
        sequence = "aa",
        cutoff = 15)

If I run this interactively in a Jupyter Notebook on 1 core I get 80% of the way through the process in 4 hours (the maximum amount of time I'm able to run an interactive session for) at 18.05s/iteration. If I submit the same data/script but as a HPC job utilising 6 cores the job runs significantly more slowly at 464.09s/iteration progressing about 5% of the way through in 1 hour. Both are running the same conda environment on the same object on the same hardware

This is my environment, running on CentOS Linux 8.1

Could you offer any advice? Thanks!

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
adjusttext                1.2.0                    pypi_0    pypi
airr                      1.5.1                    pypi_0    pypi
anndata                   0.10.8                   pypi_0    pypi
array-api-compat          1.8                      pypi_0    pypi
asttokens                 2.4.1                    pypi_0    pypi
awkward                   2.6.7                    pypi_0    pypi
awkward-cpp               37                       pypi_0    pypi
bzip2                     1.0.8                h4bc722e_7    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
certifi                   2024.7.4                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
comm                      0.2.2                    pypi_0    pypi
contourpy                 1.2.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
debugpy                   1.8.5                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
fonttools                 4.53.1                   pypi_0    pypi
fsspec                    2024.6.1                 pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
igraph                    0.11.6                   pypi_0    pypi
ipykernel                 6.29.5                   pypi_0    pypi
ipython                   8.26.0                   pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
jupyter-client            8.6.2                    pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
legacy-api-wrap           1.4                      pypi_0    pypi
levenshtein               0.25.1                   pypi_0    pypi
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgomp                   14.1.0               h77fa898_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
llvmlite                  0.43.0                   pypi_0    pypi
matplotlib                3.9.0                    pypi_0    pypi
matplotlib-inline         0.1.7                    pypi_0    pypi
mudata                    0.3.0                    pypi_0    pypi
muon                      0.1.6                    pypi_0    pypi
natsort                   8.4.0                    pypi_0    pypi
ncurses                   6.5                  h59595ed_0    conda-forge
nest-asyncio              1.6.0                    pypi_0    pypi
networkx                  3.3                      pypi_0    pypi
numba                     0.60.0                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
openssl                   3.3.1                h4bc722e_2    conda-forge
packaging                 24.1                     pypi_0    pypi
palmotif                  0.4                      pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parasail                  1.3.4                    pypi_0    pypi
parso                     0.8.4                    pypi_0    pypi
patsy                     0.5.6                    pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
pillow                    10.4.0                   pypi_0    pypi
pip                       24.2               pyhd8ed1ab_0    conda-forge
platformdirs              4.2.2                    pypi_0    pypi
pooch                     1.8.2                    pypi_0    pypi
prompt-toolkit            3.0.47                   pypi_0    pypi
protobuf                  5.27.3                   pypi_0    pypi
psutil                    6.0.0                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.3                    pypi_0    pypi
pygments                  2.18.0                   pypi_0    pypi
pynndescent               0.5.13                   pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
python                    3.12.4          h194c7f8_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python-levenshtein        0.25.1                   pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     26.1.0                   pypi_0    pypi
rapidfuzz                 3.9.5                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3                   pypi_0    pypi
scanpy                    1.10.2                   pypi_0    pypi
scikit-learn              1.5.1                    pypi_0    pypi
scipy                     1.14.0                   pypi_0    pypi
scirpy                    0.17.2                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
session-info              1.0.0                    pypi_0    pypi
setuptools                72.1.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
squarify                  0.4.4                    pypi_0    pypi
stack-data                0.6.3                    pypi_0    pypi
statsmodels               0.14.2                   pypi_0    pypi
stdlib-list               0.10.0                   pypi_0    pypi
svgwrite                  1.4.3                    pypi_0    pypi
texttable                 1.7.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tornado                   6.4.1                    pypi_0    pypi
tqdm                      4.66.5                   pypi_0    pypi
traitlets                 5.14.3                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
umap-learn                0.5.6                    pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yamlordereddictloader     0.4.2                    pypi_0    pypi
grst commented 1 month ago

1) are you sure the jupyter notebook is only using one core (i.e. did you verify in e.g. htop)? Not all schedulers enforce that. 2) have you considered the tcrdist metric instead of alignment? It should give very similar results while being much faster. 3) lastly, I'd suggest to upgrade to v0.18. It doesn't affect the speed of the alignment metric, but the subsequent clonotype clustering step is much faster now.

grst commented 4 days ago

Closing this for now, feel free to reopen if you are still having issues.