pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 23 forks source link

Containerized kb can choose the wrong binary #239

Closed gennadyFauna closed 4 months ago

gennadyFauna commented 4 months ago

Describe the issue

In some contexts, kb calls the versions of kallisto and bustools hosted in /usr/local/bin instead of the provided binaries.

I encountered this problem investigating the 0.28.2 (_02) Docker container hosted at quay.io, which is built using this bioconda recipe. The recipe asks to install kallisto and bustools, which are not kb prerequisites. Once the Docker container is built, kb uses those versions as defaults.

I have already opened a PR at bioconda-recipes to fix this issue there, but it might be helpful to check why kb may be requesting the wrong binaries.

What is the exact command that was run?

docker pull quay.io/biocontainers/kb-python:0.28.2--pyhdfd78af_0
docker run -it quay.io/biocontainers/kb-python:0.28.2--pyhdfd78af_0 sh
cd /home
wget https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa -O genome.fa
wget https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf -O genes.gtf
kb ref -i index.idx -g t2g.txt -f1 cdna.fa genome.fa genes.gtf

If the final line is substituted with

kb ref -i index.idx -g t2g.txt -f1 cdna.fa genome.fa genes.gtf --kallisto=PREBUILT

with the location of the provided version instead of PREBUILT, the kb ref command succeeds.

Command output (with --verbose flag)

[2024-02-16 23:47:35,838]   DEBUG [main] Printing verbose output
[2024-02-16 23:47:38,043]   DEBUG [main] kallisto binary located at /usr/local/bin/kallisto
[2024-02-16 23:47:38,043]   DEBUG [main] bustools binary located at /usr/local/bin/bustools
[2024-02-16 23:47:38,043]   DEBUG [main] Creating `tmp` directory
[2024-02-16 23:47:38,043]   DEBUG [main] Namespace(list=False, command='ref', tmp=None, keep_tmp=False, verbose=True, i='index.idx', g='t2g.txt', f1='cdna.fa', include_attribute=None, exclude_attribute=None, f2=None, c1=None, c2=None, d=None, k=None, t=8, d_list=None, d_list_overhang=1, aa=False, workflow='standard', distinguish=False, make_unique=False, overwrite=False, kallisto='kallisto', bustools='bustools', fasta='genome.fa', gtf='genes.gtf', feature=None, no_mismatches=False, ec_max_size=None, flank=None)
[2024-02-16 23:47:38,043]    INFO [ref] Preparing genome.fa, genes.gtf
[2024-02-16 23:47:38,966]    INFO [ref] Splitting genome genome.fa into cDNA at /home/tmp/tmp11qsrunn
[2024-02-16 23:47:39,893]    INFO [ref] Concatenating 1 cDNAs to cdna.fa
[2024-02-16 23:47:39,901]    INFO [ref] Creating transcript-to-gene mapping at t2g.txt
[2024-02-16 23:47:39,930]    INFO [ref] Indexing cdna.fa to index.idx
[2024-02-16 23:47:39,930]   DEBUG [ref] kallisto index -i index.idx -k 31 -t 8 -d genome.fa cdna.fa
[2024-02-16 23:47:41,032]   DEBUG [ref]
[2024-02-16 23:47:41,032]   ERROR [ref]
[2024-02-16 23:47:41,032]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/usr/local/lib/python3.10/site-packages/kb_python/main.py", line 356, in parse_ref
    ref(
  File "/usr/local/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/kb_python/ref.py", line 678, in ref
    ) if n > 1 else kallisto_index(
  File "/usr/local/lib/python3.10/site-packages/kb_python/ref.py", line 291, in kallisto_index
    run_executable(command)
  File "/usr/local/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/usr/local/bin/kallisto index -i index.idx -k 31 -t 8 -d genome.fa cdna.fa' died with <Signals.SIGILL: 4>.
[2024-02-16 23:47:41,038]   DEBUG [main] Removing `tmp` directory
gennadyFauna commented 4 months ago

Probably jus an issue with this: https://github.com/bioconda/bioconda-recipes/blob/8fb4d6c58a7364d62fc730ca37238bea9c40b0c2/recipes/kb-python/config.py.patch, which forces kb to use external kallisto and bustools binaries.