AttributeError when using `kage genotype --gpu True`

dirkjanvw commented 5 months ago

Hi, thanks for the interesting tool! I was trying it out on a small yeast set (VCF from the MC pipeline, but treated with the suggested bcftools norm -m -any). However, I found a potential bug in the GPU-accelerated version of KAGE.

When I run it with --gpu True I get this error:

$ kage genotype -i results/4.wgs_addition/kage/17/small.npz -r results/4.wgs_addition/kage/17/small/BY.fastq -t 10 --average-coverage 20 -k 17 --gpu True -o results/4.wgs_addition/kage/17/small/BY.vcf
/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/bionumpy/encodings/vcf_encoding.py:98: RuntimeWarning: invalid value encountered in cast
  _lookup[[ord(c) for c in ('0', '1', '.')]] = np.array([0, 1, np.nan])
INFO:root:Read coverage is set to 20.000
INFO:root:Reading all indexes from an index bundle
INFO:root:Will count kmers.
INFO:root:N bytes of reads: 13509977734
INFO:root:Approx number of chunks of 10000000 bytes: 1350
Setting backend to <module 'cupy' from '/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/cupy/__init__.py'>
INFO:root:Using buffer type <class 'bionumpy.io.fastq_buffer.FastQBuffer'>
INFO:root:Using igzip
Traceback (most recent call last):
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/bin/kage", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/kage/command_line_interface.py", line 52, in main
    run_argument_parser(sys.argv[1:])
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/kage/command_line_interface.py", line 552, in run_argument_parser
    args.func(args)
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/kage/command_line_interface.py", line 98, in genotype
    node_counts = get_kmer_counts(kmer_index, args.kmer_size, args.reads, config.n_threads, args.gpu)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/kage/command_line_interface.py", line 58, in get_kmer_counts
    return NodeCounts(map_bnp(Namespace(
                      ^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/kmer_mapper/command_line_interface.py", line 102, in map_bnp
    node_counts = map_gpu(kmer_index, chunks, k, args.gpu_hash_map_size, args.map_reverse_complements)
                                                 ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Namespace' object has no attribute 'gpu_hash_map_size'

However, if I leave this out, I get a successful run:

$ kage genotype -i results/4.wgs_addition/kage/17/small.npz -r results/4.wgs_addition/kage/17/small/BY.fastq -t 10 --average-coverage 20 -k 17 -o results/4.wgs_addition/kage/17/small/BY.vcf
/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/71c2da87c59e0ff181c50278d53ed8af_/lib/python3.11/site-packages/bionumpy/encodings/vcf_encoding.py:98: RuntimeWarning: invalid value encountered in cast
  _lookup[[ord(c) for c in ('0', '1', '.')]] = np.array([0, 1, np.nan])
INFO:root:Read coverage is set to 20.000
INFO:root:Reading all indexes from an index bundle
INFO:root:Will count kmers.
INFO:root:N bytes of reads: 13509977734
INFO:root:Approx number of chunks of 10000000 bytes: 1350
INFO:root:Using buffer type <class 'bionumpy.io.fastq_buffer.FastQBuffer'>
INFO:root:Using igzip
1351it [01:08, 19.65it/s]                                                                                                                                                                                                                                     
INFO:root:Time spent only on hashing and counting hashes: 69.0792
INFO:root:Max variant id is assumed to be 55804
INFO:root:Will use tricky alleles
INFO:root:Memory usage (Genotyping, before getting ref/var nodes): 2.5571 GB
INFO:root:Memory usage (Genotyping, before getting observed counts): 2.5571 GB
INFO:root:Creating combomodels
INFO:root:Memory usage (Before model): 2.5571 GB
INFO:root:Memory usage (After model): 2.5571 GB
INFO:root:Memory usage (Before making _genotype_probs): 2.5571 GB
INFO:root:Computing genotype probs in HelperModel init took 0.0081 sec
INFO:root:Getting count probs for all combinations of genotypes took 0.27276 sec
INFO:root:Using tricky variants in HelperModel.score. There are 0 tricky variants
INFO:root:Computing log_probs took 0.01053 sec
INFO:root:Time spent on log_probs in HelperModel.score: 0.0106
INFO:root:Computinng logsumex across helper axis took 0.01909 sec
INFO:root:Computinng result-logsumexp(result) took 0.00899 sec
INFO:root:Time spent to compute probs using helper probs in HelperModel.score: 0.0282
INFO:root:Translating genotypes to numeric
INFO:root:99 probs are nan. Setting them to 1/3 before postprocesing genotypes
INFO:root:Number of genotypes that were changed in postprocessing: 7
INFO:root:Writing vcf using Vcf entry to results/4.wgs_addition/kage/17/small/BY.vcf
INFO:root:Writing genotype likelyhoods to file
INFO:root:0 gls positive
INFO:root:Processing genotype likelihoods took 0.016 sec
INFO:root:Computing GLs took 0.009 sec
INFO:root:Creating genotype strings took 0.803 sec
INFO:root:Writing to vcf took 1.234 sec
INFO:root:Genotyping took 74 sec

This is the YAML file for KAGE2 that I used:

channels:
  - conda-forge
  - bioconda
dependencies:
  - python=3.11
  - bcftools=1.19
  - cupy=13.0.0
  - pip:
    - kage-genotyper==2.0.3
    - kmer-mapper==0.0.32

I also tried updating kage-genotyper to 2.0.4 and kmer-mapper to 0.0.36, but I get the same error for GPU.

ivargr commented 5 months ago

Hi!

Thanks for reporting! Unfortunately, the GPU support had been neglected a bit in the recent KAGE updates, and seems to have become broken with some of the recent changes.

I think I have now fixed these problems. Could you try updating kage to version 2.0.5 and see if things now work? This can be done with pip install kage-genotyper==2.0.5. There are also some other pip packages that will requires some updates (bionumpy, npstructures, kmer_mapper), but I think these should be automatically updated now when you update KAGE, so only updating KAGE so hopefully work.

Also, while digging into this, some of the fixes I made will require version 12 or higher of the cupy package, so if you run into any problems check that you have version 12 or higher.

Let me know if you run into any problems :)

dirkjanvw commented 5 months ago

Wow thank you for the very quick response! It seems to work, but I now realise that I should have installed cucounter as well (which it doesn't with pip), so give a bit of time to get that one installed too and I'll report how it goes :)

dirkjanvw commented 5 months ago

Hmm I have tried getting cucounter to work, but I can't seem to get it installed (I would like kage including all dependencies to be installed in one conda or one singularity image). So far I have tried pip for conda (didn't work, as your README also say) and this singularity definition file:

Bootstrap: docker
From: python:3.9-slim-bookworm

%post
   apt-get update
   apt-get install -y git make gcc g++ zlib1g-dev cmake

%post
   pip install pip --upgrade
   pip install cmake
   pip install kage-genotyper==2.0.5

%post
   cd /opt
   git clone https://github.com/jorgenwh/cucounter.git
   cd cucounter
   git checkout b76210f9486d3b7638602fee1fa3b01313a0515c
   sed -i '1,2d' requirements.txt
   sed -i 's/^\(.*\) \(.*\)$/\1==\2/' requirements.txt
   pip install -r requirements.txt
   pip install .

Would it be possible for you to either fix the pypi recipe for cucounter or help me with this singularity definition file?

For now, I'll keep exploring the non-GPU version, but I think it would be very nice to try out the GPU one too!

ivargr commented 5 months ago

I've now made cucounter available through pip with the package name kage-cucounter. I think it should then be possible to install it using pip through conda, as long as cupy is installed also seperately.

Hopefully that makes what you want to achieve possible, but let me know if there still are any issues.

dirkjanvw commented 5 months ago

It took me a while to figure out how to install it on my system via conda, but I think I have it now!

channels:
  - conda-forge
  - bioconda
dependencies:
  - python=3.11
  - bcftools=1.17
  - cmake=3.28.3
  - cupy=13.0.0
  - cuda=12.2
  - pip:
    - kage-genotyper==2.0.5
    - kage-cucounter==1.0.1

However, now I found a potential bug(?) that has to do with numpy. Maybe you know whether there is a specific version of (bio)numpy that I should pin as well to fix this?

/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encodings/vcf_encoding.py:98: RuntimeWarning: invalid value encountered in cast
  _lookup[[ord(c) for c in ('0', '1', '.')]] = np.array([0, 1, np.nan])
INFO:root:Read coverage is set to 20.000
INFO:root:Reading all indexes from an index bundle
INFO:root:Will count kmers.
INFO:root:N bytes of reads: 2237650624
INFO:root:Approx number of chunks of 10000000 bytes: 223
INFO:root:Making counter
INFO:root:CUDA counter initialized
Setting backend to <module 'cupy' from '/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/cupy/__init__.py'>
N unique kmers: 552980
Traceback (most recent call last):
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/bin/kage", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kage/command_line_interface.py", line 52, in main
    run_argument_parser(sys.argv[1:])
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kage/command_line_interface.py", line 552, in run_argument_parser
    args.func(args)
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kage/command_line_interface.py", line 98, in genotype
    node_counts = get_kmer_counts(kmer_index, args.kmer_size, args.reads, config.n_threads, args.gpu)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kage/command_line_interface.py", line 58, in get_kmer_counts
    return NodeCounts(map_bnp(Namespace(
                      ^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kmer_mapper/command_line_interface.py", line 104, in map_bnp
    node_counts = map_gpu(kmer_index, chunks, k, args.gpu_hash_map_size, args.map_reverse_complements)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kmer_mapper/command_line_interface.py", line 71, in map_gpu
    hashes = get_kmer_hashes_from_chunk_sequence(chunk.sequence, k)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/kmer_mapper/util.py", line 73, in get_kmer_hashes_from_chunk_sequence
    bnp.as_encoded_array(chunk_sequence, bnp.DNAEncoding), kmer_size).ravel().raw().astype(np.uint64)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encoded_array.py", line 597, in as_encoded_array
    return target_encoding.encode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encoded_array.py", line 53, in encode
    r = self._ragged_array_as_encoded_array(data)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encoded_array.py", line 74, in _ragged_array_as_encoded_array
    data = self.encode(s.ravel())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encoded_array.py", line 60, in encode
    out = EncodedArray(self._encode(data), self)
                       ^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encodings/alphabet_encoding.py", line 32, in _encode
    tmp = [chr(c) for c in byte_array.ravel()[ret.ravel()==255]][:10]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/BIF/nobackup/worku005/snakemake_conda_prefix/c221464ee6699ac4be5f15dafe5aa114_/lib/python3.11/site-packages/bionumpy/encodings/alphabet_encoding.py", line 32, in <listcomp>
    tmp = [chr(c) for c in byte_array.ravel()[ret.ravel()==255]][:10]
           ^^^^^^
TypeError: 'ndarray' object cannot be interpreted as an integer

ivargr commented 5 months ago

Glad you got it working with Conda!

I'm a bit unsure what is causing the error you get, and I've tried reproducing this error with some testdata I have without luck. But I think it might be a bug in kage or how we integrate with BioNumPy, maybe something that happens with the given reads you have.

Would you by chance be able to share the data you used to test this? If easier, only a small part of the reads could also be helpful, just so I could see if there's something specific about the reads that causes this problem.

ivargr commented 5 months ago

One possible cause might be that you have something else than A, C, G, T or N in your reads. Would you be able to check if that might be True?

If that is the case, it should give a somewhat cleaner error message than what you got when run in cpu-mode, but in GPU-mode I think another error happens when the error message is formatted.

dirkjanvw commented 4 months ago

Yes sure! I took the data that is used for the Minigraph-Cactus pipeline. I constructed the npz index using the reference genome and the vcf file as per Minigraph-Cactus example (tarball: all.npz.tgz). Then I used the SRR9330893 short-read data for kage.

Also, I checked the nucleotides in the reads and I don't see any special characters:

$ zcat resources/yeast/wgs/SRR9330839_* | awk 'FNR%4==2' | fold -w 1 | awk '{n[$1]+=1;} END{for (i in n){print i, n[i];}}'
N 205994
A 303308634
C 175953558
G 174474077
T 305301705

ivargr commented 4 months ago

Thanks for sharing!

I think I then found the problem. Unfortunately I had introduced a bug in the previous attempt to fix the gpu support that made it not handle N's in the sequence. Sorry for this. I think if you update kage again to version 2.0.6 (use kage-genotyper==2.0.6 in conda) things should hopefully work now.

(I've tested with the SRR9330839 reads after this fix to make sure that works)

dirkjanvw commented 4 months ago

Great, everything works now! Thanks a lot for fixing this!

kage-genotyper / kage

AttributeError when using `kage genotype --gpu True` #10