samtools / htslib

C library for high-throughput sequencing data formats
Other
812 stars 447 forks source link

Inappropriate error message #273

Open jkbonfield opened 9 years ago

jkbonfield commented 9 years ago

If I attempt to use --input-fmt-option nthreads=4 when opening a BAM file it tells me it cannot open the file.

@ seq3c[samtools.../samtools]; ./samtools calmd --input-fmt-option nthreads=4 /tmp/NA12878.high_coverage.chr1_2.bam ~/scratch/data/indices/hgc19.fa
[E::hts_open_format] fail to open file '/tmp/NA12878.high_coverage.chr1_2.bam'

This is due to hopen ultimately calling hts_opt_apply for the NTHREADS option which then in turn calls bgzf_mt to set the number of threads. This returns -1 as it doesn't support multithreaded decoding. This bubbles all the way back up to hopen or hopen_fd which then returns a generic (and incorrect) couldn't open file error.

In this situation we should just emit an error message but not fail as it's simply an efficiency thing. Denying the option from working at all means we have to know whether we are opening a cram or a bam.

trifud commented 8 years ago

I have a similar problem. I've set up a download station - I have a Raspberry Pi running Arch Linux. I have a script which downloads 1000GP BAM files to external HDD and then extracts the Y-chromosome data only discarding the rest but samtools fails:

    samtools view -b HG02233.mapped.ILLUMINA.bwa.IBS.low_coverage.20120522.bam Y > foo.bam
    [E::hts_open_format] fail to open file 'HG02233.mapped.ILLUMINA.bwa.IBS.low_coverage.20120522.bam'
    samtools view: failed to open "HG02233.mapped.ILLUMINA.bwa.IBS.low_coverage.20120522.bam" for reading: Value too large for defined data type

Any idea how to fix this? I am running the git version of htslib and samtools since samtools does not exist in the Arch repositories.

jmarshall commented 8 years ago

@jkbonfield's original report will be improved with additional threading infrastructure.

@trifud's problem is unrelated, and is probably because samtools as compiled can't open a >2Gb BAM on a 32-bit Raspberry Pi. Please try adding -D_FILE_OFFSET_BITS=64 to CPPFLAGS in htslib/Makefile and recompiling everything. We don't need _FILE_OFFSET_BITS anywhere on most modern (64-bit) hosts but we probably do need it on 32-bit hosts.

trifud commented 8 years ago

@jmarshall, I had tried this before but it didn't work. After updating to the latest revision, it worked. Interestingly, samtools comes with -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE in the make file by default but not htslib.