samtools / htslib

C library for high-throughput sequencing data formats
Other
810 stars 446 forks source link

S3 access not working (as expected) when paths contain "#" #492

Closed dkj closed 5 years ago

dkj commented 7 years ago

Whilst I might expect to have to escape a # in an http:// or https:// URI, I don't for an s3:// URI. The s3cmd tool does not expect such escaping.

$ samtools/1.4/bin/samtools view -H s3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5#1/17626_5#1.cram
[E::hts_open_format] fail to open file 's3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5#1/17626_5#1.cram'
samtools view: failed to open "s3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5#1/17626_5#1.cram" for reading: Permission denied
$ samtools/1.4/bin/samtools view -H s3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5%231/17626_5%231.cram | wc -l
3406

$ s3cmd info s3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5#1/17626_5#1.cram | grep size:
   File size: 3418351545
$ s3cmd info s3://npg_cloud_realign_wip/SC_WES_INT5823952/17626_5%231/17626_5%231.cram | grep size:
ERROR: S3 error: 404 (Not Found)
daviesrob commented 7 years ago

I suspect s3cmd may contain code to do the escaping for you. We will take a look and see if something similar can be done in htslib.

daviesrob commented 5 years ago

Fixed by #839