googleapis / google-cloud-ruby

Google Cloud Client Library for Ruby
https://googleapis.github.io/google-cloud-ruby/
Apache License 2.0
1.36k stars 550 forks source link

BigQuery: Errno::EPIPE on loading csv file #266

Closed vitaliel closed 9 years ago

vitaliel commented 9 years ago

Hi,

I'm trying to upload 100Mb csv file to bigquery, but I get Errno::EPIPE errors.

Snippet:

gcloud = Gcloud.new project_id, key_file
bigquery = gcloud.bigquery
dataset = bigquery.dataset 'logging'
table = dataset.table table_name
load_job = table.load 'site_access.csv', chunk_size: 10 * 1024 * 1024

I get the error after 10 seconds, but If I do not pass chunk_size, it fails after 50 seconds.

Exception:

$ time ./bin/bq_uploader
/home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:326:in `syswrite': Broken pipe (Errno::EPIPE)
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:326:in `do_write'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:344:in `write'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:205:in `copy_stream'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:205:in `send_request_with_body_stream'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:122:in `exec'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1412:in `block in transport_request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1411:in `catch'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1411:in `transport_request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1384:in `request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1377:in `block in request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:853:in `start'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1375:in `request'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:82:in `perform_request'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:40:in `block in call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:32:in `call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/response.rb:8:in `call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client/request.rb:163:in `send'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client/request.rb:174:in `send'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:648:in `block (2 levels) in execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:645:in `block in execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:636:in `execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:679:in `execute'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/connection.rb:307:in `load_resumable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:758:in `load_resumable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:750:in `load_local'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:613:in `load'
    from /home/lz/projects/assembla/bq_uploader/lib/assembla/bq_uploader.rb:41:in `initialize'
    from ./bin/bq_uploader:10:in `new'
    from ./bin/bq_uploader:10:in `<main>'
./bin/bq_uploader  1,21s user 0,09s system 12% cpu 10,104 total
blowmage commented 9 years ago

Thanks again for opening the issue. We'll get right on it.

vitaliel commented 9 years ago

It's strange, I succeded only with a csv file with size < 5_000_000 bytes and 39250 rows.

quartzmo commented 9 years ago

Hi @vitaliel, and thank you for reporting this!

This Broken pipe (Errno::EPIPE) error that you appear to have encountered is a known issue that is the root cause of three of the 39 currently open issues in google-api-ruby-client (upon which Gcloud depends.) They are:

A solution, documented in two of the issues above as well as in this Stack Overflow answer, is to add this line before your code (right after requiring gcloud.) You will also need to add httpclient as a dependency in your project.

Faraday.default_adapter = :httpclient

Can you give this a try and let us know if it solves the problem? If so, I will add the solution to the documentation for Table#load, and close this issue.

Thank you @blowmage for providing the background story on this.

vitaliel commented 9 years ago

@quartzmo Thanks, it worked.

quartzmo commented 9 years ago

@vitaliel Great. I will add documentation of this issue to the API doc for Table#load, and in a Cloud Storage method where it is also possible. Then I will close this issue. Thanks again.

blowmage commented 9 years ago

FYI, the updated docs will be included in the next point release (0.3.1), but the release after that (0.4.0) will most likely switch dependencies from Faraday to Hurley, meaning this guidance will change. Hopefully Hurley will be an improvement on Faraday and not have this issue in the default provider. :)