Closed agilly closed 1 year ago
As far as I'm aware it shouldn't need any special permissions. If aws s3 cp
works but tabix
doesn't then I'd suspect a mismatch between how the two programs are getting the credentials needed to access your S3 bucket. Depending on how your AWS instance was set up, you may have to give HTSlib some hints about how to get them. The files it looks at, and the environment variables that can be used to influence it are documented in the htslib-s3-plugin manual page.
Thank you for your reply @daviesrob. We figured it was some kind of permissions issue so we tried a few things with colleagues. We used this curl command curl http://169.254.169.254/latest/meta-data/iam/security-credentials/team-name
and added the keyid, accesskey and token in the .aws/credentials
file which is one that is compatible with htslib. However that only got us one bit further, to another error. This time the file seems to be stat-able but not actually readable:
[E::test_and_fetch] Failed to close remote file s3://path/to/vcf.gz.tbi
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
Any pointers on what to investigate next?
It's a bit difficult to say on the information available. You might want to try boosting the verbosity again to see if it gives any hints. Also, have you switched to an old version of HTSlib? The "[E::test_and_fetch] Failed to close remote file" message only existed in that form between releases 1.5 and 1.10 (after which the function was renamed to idx_test_and_fetch
).
It looks like you're using IAM credentials. Could they have expired while your process was running (it would have been going for quite a long time)? If that's the case, you could try the script in the short-lived credentials section of the htslib-s3-plugin manual page. The idea is that you run it in the background, where it wakes up occasionally and downloads a new set of credentials before the old ones have expired. HTSlib's S3 plugin will then refresh its stored credentials from the file if they're about to expire (note that this only works in version 1.16).
Thanks @daviesrob, there was indeed a hiccup where we inadvertently switched to 1.12 while using your script. Running the keepalive script you mentioned solves the issue when using 1.16. Thanks! Closing issue.
This issue arises for both the latest release (1.16) and today's pull from the development branch.
Situation: Both a
vcf.gz
and itsvcf.gz.tbi
index file are stored in an S3 bucket. Code below is being run from an AWS instance running Ubuntu.Behavior: Tabix queries (for example
./tabix -l s3://path/to/file.vcf.gz
fail with error:This error does not occur when a local copy of the .tbi exists in the cwd (after fetching with
aws s3 cp
). This makes sense since the index is now local, and-l
triggerstbx_seqnames(...)
.When a range is provided, like:
an error triggers, which would suggest s3 access is not possible at all. Here is what I get with
--verbosity 9
:I can confirm that I have access to both files via e.g.
aws s3 cp
. Does tabix require special permissions to be enabled?