Closed GoogleCodeExporter closed 9 years ago
Hi, do you have some job ids for the load jobs that we could look into?
Original comment by thomasp...@google.com
on 4 Jun 2014 at 9:48
Thanks for getting back to me. Here are two recent long-running load jobs:
bqjob_r3093f8ddda320493_00000146685decfd_1 load SUCCESS 04 Jun 12:32:20 0:20:49
bqjob_r54ff203f8b39c4b_000001466907c091_1 load SUCCESS 04 Jun 15:37:43 0:16:22
Perhaps there is a way to use the `bq` command to list failed jobs. My ETL
code just killed a job that had exceeded 30 minutes duration. (It's hard to
capture the load ID during the running of the `bq` command during an ETL, as it
assumes an interactive console and spams the log file with ^Ms and updates,
necessitating the --quiet flag.)
If you look into these jobs and see something suboptimal in the way the data
are being prepared, let me know.
Original comment by ewal...@pagerduty.com
on 4 Jun 2014 at 11:24
s/load ID/job ID/
Original comment by ewal...@pagerduty.com
on 4 Jun 2014 at 11:26
Looking at the recent history for your load jobs that have been similarly
configured to the two above (~60 MB compressed file from Cloud Storage), the
average execution time has been over 10 minutes. I have not observed any
throttling for this load job, nor extraordinary resource contention.
Although the source file is ~60 MB compressed, this expands to several GB
uncompressed. One suggestion is to split your source file into smaller chunks
if possible, and you can specify multiple sourceUris in a single load job:
https://developers.google.com/bigquery/docs/reference/v2/jobs#configuration.load
.sourceUris. This will help parallelize the load processing. However, I should
note that splitting the source file into n files will not guarantee an n-times
speed up due to a number of factors, such as fixed setup/teardown costs, load
in the system, etc.
Original comment by bch...@google.com
on 6 Jun 2014 at 12:43
Thank you for the pointer. What is a good size for the individual chunks
(uncompressed)?
Original comment by ewal...@pagerduty.com
on 6 Jun 2014 at 12:52
I suggest trying to split the size in half first to see if this achieves your
time range.
Original comment by bch...@google.com
on 6 Jun 2014 at 8:34
Just following up to see if you have any additional questions.
I'm going to resolve this ticket, but please feel free to re-open if there is
anything outstanding.
Original comment by bch...@google.com
on 11 Jun 2014 at 10:51
[deleted comment]
Hi again -- just to ping you regarding our recent load times. Our production
deployment has been unmodified for the last few days, but as of today have not
had a successful load:
http://i.imgur.com/P4YXO75.png
Here is a typical command that we're running that does not complete:
% .python-env/bin/bq load --field_delimiter tab --source_format CSV --job_id load_1000102 <table> gs://<path>/event_extra_data-1-copy.tsv.gz google/bigquery/schemas/raw/event_extra_data.json
Waiting on load_1000102 ... (1855s) Current status: RUNNING
(There's a job ID that you can use to look up the actual details.) I'm not
sure if our account is being throttled or if there is a general service
disruption (or maybe something else). The running load time for the command
above is 1855 s (~ 30 min.). I've seen times get as high as 4080 s before a
command is canceled. Hopefully we're not doing something wrong. Any help you
can provide in troubleshooting this is appreciated. (I posted a comment
earlier today and then took it down in order to avoid causing a wild goose
chase while I collected data.)
Original comment by ewal...@pagerduty.com
on 9 Jul 2014 at 1:32
There was an incident today where some load jobs were progressing very slowly,
creating a large backlog. We've diagnosed and fixed the root cause and the
backlog has mostly cleared by now: it should be fully gone within the next few
hours. New load jobs should be unaffected and progress at their normal pace.
Thanks for reporting the issue, and apologies if the slowness negatively
affected you.
Original comment by dk...@google.com
on 9 Jul 2014 at 4:06
Thank you for getting back to me. Glad to know we weren't doing something
wrong.
Original comment by ewal...@pagerduty.com
on 9 Jul 2014 at 9:58
Hi -- we're seeing failing loads again. Just wanted to check in with you to
make sure we're not doing something wrong on our side (I assume we aren't, but
I wanted to double-check).
http://i.imgur.com/7FedCNG.png
Original comment by ewal...@pagerduty.com
on 31 Jul 2014 at 12:50
Yep, we have a backlog of load jobs that should be starting to clear. Sorry for
the inconvenience, but hopefully this will be wrapped up in an hour or two.
By the way, feel free to create new bugs for new issues, since adding comments
to an already-fixed bug might cause your issue report to be overlooked.
Original comment by jcon...@google.com
on 31 Jul 2014 at 6:39
> Sorry for the inconvenience, but hopefully this will be wrapped up in an hour
or two.
We love BigQuery, not the least because it makes so many things possible. But
the service disruptions are pretty inconvenient. It puts my team in an awkward
position when the freshness of the data in our dashboards falls behind by
several hours, especially during the last few days of the month, when there's a
big push to close deals. Are there any plans to add additional capacity to
accommodate the periodic spikes in heavy load jobs?
> By the way, feel free to create new bugs for new issues
I will plan to create a new issue, then, if we're curious for a status check.
Thanks for replying to this one. Btw, do you guys have a status page somewhere?
Original comment by ewal...@pagerduty.com
on 31 Jul 2014 at 3:53
Understood. We know that these service disruptions are painful, and we're doing
our best to keep things running smoothly. Apologies for the trouble!
In this case, the root cause turned out to be a misconfiguration that gave our
service unexpectedly low priority for network bandwidth. Unfortunately we
didn't discover the misconfiguration until network bandwidth became scarce
yesterday.
We don't yet have a status page, but we'll look into getting one set up. In the
meantime, don't hesitate to reach out to us proactively by filing a bug here
when you think something's wrong. (Obviously we'll do our best to fix problems
before you notice!)
By the way, have you considered using our streaming API? Bulk data loads don't
have any strict latency guarantees, so if freshness is important to you, you
might find the streaming API useful.
Original comment by jcon...@google.com
on 31 Jul 2014 at 4:05
Thanks for the additional details. I'm sure you guys work hard to keep things
running smoothly.
> By the way, have you considered using our streaming API?
I'd love to use a low-latency API like the streaming API. But we have very
strict requirements on the accuracy of the numbers we show. I'm concerned
that if resource contention or retry logic on our side caused us to go beyond
the short dedup period offered by the streaming API, we'd end up inserting
duplicate rows.
Original comment by ewal...@pagerduty.com
on 31 Jul 2014 at 4:14
Got it. Thanks for the feedback!
Original comment by jcon...@google.com
on 31 Jul 2014 at 4:30
Also, sometimes we have to revise data that we would already have published to
the API in the past, and I believe the streaming API is append-only. I think
there are some clever things you can do to achieve the same effect as updates
to existing rows with an append-only table, but we're not there yet.
Original comment by ewal...@pagerduty.com
on 31 Jul 2014 at 4:32
Original issue reported on code.google.com by
ewal...@pagerduty.com
on 3 Jun 2014 at 9:15