samtools / samtools

Tools (written in C using htslib) for manipulating next-generation sequencing data
http://htslib.org/
Other
1.61k stars 577 forks source link

samtools index s3://bucket/path/file.cram does not work, despite Amazon S3 scheme handlers #1936

Closed cariaso closed 10 months ago

cariaso commented 10 months ago

Are you using the latest version of samtools and HTSlib? If not, please specify.

(run samtools --version)

samtools 1.18-15-g9a59467
Using htslib 1.18-50-g99415e2a
Copyright (C) 2023 Genome Research Ltd.

Samtools compilation details:
    Features:       build=configure curses=yes 
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -Wall -g -O2
    LDFLAGS:        
    HTSDIR:         ../htslib
    LIBS:           
    CURSES_LIB:     -lncursesw

HTSlib compilation details:
    Features:       build=configure libcurl=yes S3=yes GCS=yes libdeflate=yes lzma=yes bzip2=yes plugins=yes plugin-path=/usr/libexec/htslib: htscodecs=1.5.2
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -Wall -g -O2 -fvisibility=hidden
    LDFLAGS:        -fvisibility=hidden -rdynamic

HTSlib URL scheme handlers present:
    built-in:    preload, data, file
    S3 Multipart Upload:     s3w, s3w+https, s3w+http
    Amazon S3:   s3, s3+https, s3+http
    Google Cloud Storage:    gs+http, gs, gs+https
    libcurl:     ftp, http, https, ftps
    crypt4gh-needed:     crypt4gh
    mem:     mem

However I've also confirmed that this does work fine when run under other environments such as

Using htslib 1.18
Copyright (C) 2023 Genome Research Ltd.

Samtools compilation details:
    Features:       build=configure curses=yes 
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -Wall -g -O2
    LDFLAGS:        
    HTSDIR:         htslib-1.18
    LIBS:           
    CURSES_LIB:     -lncursesw

HTSlib compilation details:
    Features:       build=configure libcurl=yes S3=yes GCS=yes libdeflate=yes lzma=yes bzip2=yes plugins=no htscodecs=1.5.1
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -Wall -g -O2 -fvisibility=hidden
    LDFLAGS:        -fvisibility=hidden 

HTSlib URL scheme handlers present:
    built-in:    preload, data, file
    S3 Multipart Upload:     s3w, s3w+https, s3w+http
    Amazon S3:   s3+https, s3+http, s3
    Google Cloud Storage:    gs+http, gs+https, gs
    libcurl:     imaps, pop3, gophers, http, smb, gopher, sftp, ftps, imap, smtp, smtps, rtsp, scp, ftp, telnet, mqtt, rtmp, ldap, https, ldaps, smbs, tftp, pop3s, dict
    crypt4gh-needed:     crypt4gh
    mem:     mem

Please describe your environment.

Failure env: Linux 6.1.59-84.139.amzn2023.aarch64 aarch64 gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)

build steps

 yum -y install autoconf automake xz-devel bzip2-devel curl-devel ncurses-devel
 git clone https://github.com/ebiggers/libdeflate
 git clone https://github.com/samtools/htslib.git
 git clone https://github.com/samtools/samtools.git
 yum -y install cmake3
 bash -c 'cd libdeflate && cmake3  -D CMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=lib  -B build . && cmake3 --build build'
 bash -c 'cd libdeflate/build && make install'
 bash -c 'cd htslib && git submodule update --init --recursive && autoreconf -i &&  ./configure --prefix=/usr --enable-plugins --enable-libcurl --with-libdeflate &&\
 sudo make install'
 bash -c 'cd samtools && autoreconf -i &&  ./configure --prefix=/usr --enable-plugins --enable-libcurl --with-libdeflate && sudo make install'

Please specify the steps taken to generate the issue, the command you are running and the relevant output.

samtools index 's3://mybucket/mypath/NA12878.cram' -o foo samtools index: "s3://mybucket/mypath/NA12878.cram" is in a format that cannot be usefully indexed

although the same file at an https or local url works fine. samtools index 'https://gatk-test-data.s3.amazonaws.com/wgs_cram/NA12878_20k_hg38/NA12878.cram' -o foo

aws s3 cp s3://mybucket/mypath/NA12878.cram . samtools index NA12878.cram -o foo

whitwham commented 10 months ago

I am not really sure what is happening here. There error message is one produced when the file format is not recognised as one it can index (cram, bam or compressed sam).

My own tests with the latest code worked as expected.

What happens when you do: samtools view s3://mybucket/mypath/NA12878.cram

cariaso commented 10 months ago

I can no longer reproduce the test case. User error. My apologies.