samtools / htslib

C library for high-throughput sequencing data formats
Other
801 stars 446 forks source link

GCE implementation with user pays buckets #1313

Closed migrau closed 3 years ago

migrau commented 3 years ago

I am trying to read remotely a CRAM file stored in a GCE bucket that is configured as user pay:

(google-cloud) mgrau@z25:~$ samtools view -H gs://bucket_name/file.cram
[E::hts_open_format] Failed to open file "gs://bucket_name/file.cram" : Invalid argument
samtools view: failed to open "gs://bucket_name/file.cram" for reading: Invalid argument

And with --verbosity, these are the last lines:

authorization: Bearer < my GCS_OAUTH_TOKEN >

* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 400 
< x-guploader-uploadid: ADPycdu-ouZqyc1jYHhl__xxxxx
< content-type: application/xml; charset=UTF-8
< content-length: 266
< date: Tue, 03 Aug 2021 16:25:01 GMT
< expires: Tue, 03 Aug 2021 16:25:01 GMT
< cache-control: private, max-age=0
< server: UploadServer
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
* Connection #0 to host bucket_name.storage-download.googleapis.com left intact
[main_samview] fail to read the header from "gs://bucket_name/file.cram".

I tried using the GCS_REQUESTER_PAYS_PROJECT=my-project-name option (as explained here and here) but I end up with the same error: Invalid argument. I tried with project-name and project-id with same result.

whitwham commented 3 years ago

Just to confirm, you are using samtools 1.13?

migrau commented 3 years ago

Yes

migrau commented 3 years ago

@whitwham, appreciate your help. Is it actually possible reading remotely a CRAM file stored in a GCE bucket that is configured as user pay?

whitwham commented 3 years ago

@migrau It should be.

Give it a try with a simpler program. See what kind of result you get with htsfile. e.g. htsfile --copy gs://my_bucket/myfile.file myfile_local.file

daviesrob commented 3 years ago

You might want to try htsfile -vvvvvvvv --copy gs://my_bucket/myfile.file myfile_local.file

Turning the verbosity up high enough disables fail-on-error in hfile_libcurl, which means there's a chance that the error response text from Google will be saved in the local file. That might give a few clues about what the problem is.

hfile_s3 always disables fail-on-error so it can better handle 400 responses. It might be worth doing something similar in hfile_gcs.

whitwham commented 3 years ago

@migrau Any update on this?

daviesrob commented 3 years ago

Hopefully you got this to work. Please re-open if not.

rlorigro commented 11 months ago

I am also trying to get this to work and getting the same "invalid argument" error. However, I am attempting to use the C library as part of my own C++ project. I found a "log_level" setting but I could not find a verbosity setting like the one above. I am using the GCS_OAUTH_TOKEN env variable to provide authentication. Here is my output:

no token set:

[D::init_add_plugin] Loaded "mem"
[D::init_add_plugin] Loaded "crypt4gh-needed"
[D::init_add_plugin] Loaded "libcurl"
[D::init_add_plugin] Loaded "gcs"
[E::hts_open_format] Failed to open file "gs:/[REDACTED].bam" : Permission denied
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: Cannot open bam file: gs:/[REDACTED].bam
Aborted (core dumped)

token set:

[D::init_add_plugin] Loaded "mem"
[D::init_add_plugin] Loaded "crypt4gh-needed"
[D::init_add_plugin] Loaded "libcurl"
[D::init_add_plugin] Loaded "gcs"
[E::hts_open_format] Failed to open file "gs:/[REDACTED].bam" : Invalid argument
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: Cannot open bam file: gs:/[REDACTED].bam
Aborted (core dumped)

Also, for some reason the error message omits one of the slashes in gs://

SHuang-Broad commented 7 months ago

If anyone found this ticket looking for a solution, just do the following export before your typical samtools command (if you come from samtools)

export GCS_OAUTH_TOKEN=`gcloud auth application-default print-access-token`
export GCS_REQUESTER_PAYS_PROJECT="<fill_in_your_gcp_project_here>"

the source is this line of code: https://github.com/samtools/htslib/blob/a6a6350ec24c043dad6d12d213e7e62d8f2d93fe/hfile_gcs.c#L85