pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
656 stars 172 forks source link

kallisto ref died with <Signals.SIGILL: 4>. #447

Closed yeroslaviz closed 3 months ago

yeroslaviz commented 3 months ago

I tried to index the human genome ( I know I can download it) using the command given in your README.

wget https://ftp.ensembl.org/pub/release-108/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget https://ftp.ensembl.org/pub/release-108/gtf/homo_sapiens/Homo_sapiens.GRCh38.108.gtf.gz

fa="Homo_sapiens.GRCh38.dna.primary_assembly.fa"
gtf="Homo_sapiens.GRCh38.108.gtf"

kb ref --workflow=standard -i Homo_sapiens.GRCh38_index.idx \
  -g Homo_sapiens.GRCh38_t2g.txt -f1 Homo_sapiens.GRCh38_cdna.fa \
  --include-attribute gene_biotype:protein_coding \
  --include-attribute gene_biotype:lncRNA \
  --include-attribute gene_biotype:lincRNA \
  --include-attribute gene_biotype:antisense \
  --include-attribute gene_biotype:IG_LV_gene \
  --include-attribute gene_biotype:IG_V_gene \
  --include-attribute gene_biotype:IG_V_pseudogene \
  --include-attribute gene_biotype:IG_D_gene \
  --include-attribute gene_biotype:IG_J_gene \
  --include-attribute gene_biotype:IG_J_pseudogene \
  --include-attribute gene_biotype:IG_C_gene \
  --include-attribute gene_biotype:IG_C_pseudogene \
  --include-attribute gene_biotype:TR_V_gene \
  --include-attribute gene_biotype:TR_V_pseudogene \
  --include-attribute gene_biotype:TR_D_gene \
  --include-attribute gene_biotype:TR_J_gene \
  --include-attribute gene_biotype:TR_J_pseudogene \
  --include-attribute gene_biotype:TR_C_gene \
  $fa $gtf

The index dies with the following error:

[2024-07-15 11:39:27,596]    INFO [ref] Preparing Homo_sapiens.GRCh38.dna.primary_assembly.fa, Homo_sapiens.GRCh38.108.gtf
[2024-07-15 11:41:58,043]    INFO [ref] Splitting genome Homo_sapiens.GRCh38.dna.primary_assembly.fa into cDNA at /home/yeroslaviz/poolFolders/pool-bcfngs/genomes/kallisto_transcriptomes/Homo_sapiens/tmp/tmpwwpyqbqa
[2024-07-15 11:43:33,247]    INFO [ref] Concatenating 1 cDNAs to Homo_sapiens.GRCh38_cdna.fa
[2024-07-15 11:43:41,008]    INFO [ref] Creating transcript-to-gene mapping at Homo_sapiens.GRCh38_t2g.txt
[2024-07-15 11:43:45,965]    INFO [ref] Indexing Homo_sapiens.GRCh38_cdna.fa to Homo_sapiens.GRCh38_index.idx
[2024-07-15 11:44:12,314]   ERROR [ref] 
[build] loading fasta file Homo_sapiens.GRCh38_cdna.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 1964 target sequences
[build] warning: replaced 1212 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Finished
CompactedDBG::build(): Estimated number of k-mers occurring at least once: 140070304
CompactedDBG::build(): Estimated number of minimizer occurring at least once: 34165126
[2024-07-15 11:44:12,315]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/main.py", line 356, in parse_ref
    ref(
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/ref.py", line 678, in ref
    ) if n > 1 else kallisto_index(
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/ref.py", line 291, in kallisto_index
    run_executable(command)
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/fs/home/yeroslaviz/miniconda3/envs/kallisto/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto index -i Homo_sapiens.GRCh38_index.idx -k 31 -t 8 -d Homo_sapiens.GRCh38.dna.primary_assembly.fa Homo_sapiens.GRCh38_cdna.fa' died with <Signals.SIGILL: 4>.

I have enough memory on the machine

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           1.0Ti       2.6Gi       8.0Gi       276Gi       997Gi       720Gi
Swap:             0B          0B          0B

I'm using the latest version of kallisto/kb

$ kb --version
usage: kb [-h] [--list] <CMD> ...

kb_python 0.28.2

ADDON - kallisto was installed using conda

thanks for the help

yeroslaviz commented 3 months ago

really strange behaviour. I have tested the exact same command and exact same input files on a different server.

now it runs without a problem. Both times I am using the same conda environment.

The same thing happened when trying to run kallisto quant. Running it on the first server gave me each time an error:

 quantifying File1                                                                                                                                                                                                                                                                        
[quant] fragment length distribution will be estimated from the data                                                                                                                                                                                                                                  
Illegal instruction (core dumped)                                                                                                                                                                                                                                                                     

 quantifying file2                                                                                                                                                                                                                                                                                    
 [quant] fragment length distribution will be estimated from the data                                                                                                                                                                                                                                  
Illegal instruction (core dumped)                                                                                                                                                                                                                                                                     
...

But running the exact same workflow on the second server runs smoothly (for now 😏).

I know one server is running on Intel Xeon E5-4617 0 (24) @ 3.400GHz , while the second one has AMD EPYC 7343 (64) @ 3.200GHz . Do you need other information to maybe fix this behaviour?

thanks

Assa

Yenaled commented 3 months ago

Kallisto has architecture-specific commands and therefore some kallisto binaries may be problematic on certain machines. One solution is to install kallisto from source and supply that binary path's to kb count via --kallisto=/path/to/kallisto.

We're still working on an easier way to resolve this issue (we'll likely include an option to disable the architecture-specific instructions in future kb-python releases).

yeroslaviz commented 3 months ago

So I understand it correctly, that if I want it to run on different machines, I need to install it on the machine itself?

Conda won't help here anymore?

Yenaled commented 3 months ago

You only need to install kallisto from source on a machine where you get that error — you can still use kb-python from conda. Kallisto is the thing causing issues. Hopefully this compatibility issue will be fixed soon.

yeroslaviz commented 3 months ago

ok, thanks. I'll do that. it would great though if you can fix it when you get to it.

thanks for this great tools