'bq load' fails, load job succeeds

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

It only appears sometimes. I don't know reliable way to reproduce. What I do is 
run a lot of load jobs using 'bq' command-line tool. The command looks like:

bq load --debug_mode --headless -F '\t' --quote '' --max_bad_records 100 
project:dataset.table gs://bucket/file1.tsv.gz,gs://bucket/file2.tsv.gz

What is the expected output? What do you see instead?

Sometimes (rather rare, like each 1000th run or so) the command completes with 
something like:

Waiting on bqjob_r7820b015e3a865de_000001495c5e5eb8_1 ... (0s) Current status: 
PENDING BigQuery error in load operation: Unexpected. Please try again.

But the load job itself is still working, and later completes successfully:

$ bq show -j bqjob_r7820b015e3a865de_000001495c5e5eb8_1
Job iow-rnd:bqjob_r7820b015e3a865de_000001495c5e5eb8_1

  Job Type    State      Start Time      Duration   Bytes Processed  
 ---------- --------- ----------------- ---------- ----------------- 
  load       SUCCESS   29 Oct 14:46:02   0:00:37                     

The job id above is real one I've encountered problem with. The command was ran 
at Wed Oct 29 14:45:21 UTC 2014.

What version of the product are you using? On what operating system?

BigQuery CLI 2.0.22
Ubuntu 12.04.3 LT
Python 2.7.3

Please provide any additional information below.

--debug_mode and --headless flags do not seem to affect this behavior.
Number of files loaded ranges from several to several thousands, sometimes 
wildcards are used (like 'gs://bucket/2014-10-29/*').

Because of this bug I can't practically append tables from the command line, 
because there is no easy way to check if the data was added to table or not.

Original issue reported on code.google.com by victor.g...@gmail.com on 29 Oct 2014 at 3:13

GoogleCodeExporter commented 9 years ago

Your "bq show -j" is a correct way to check for this -- can you help me 
understand why this doesn't end up letting you append loads?  (If this strategy 
works out, it might be simplified as "bq --nosync load" and "bq wait".)

Another thing you can do is generate your own job_id and use that: if your 
job_id is the same for a retry, then the retry will know not to double-add the 
data.

(See http://stackoverflow.com/questions/11017729/making-sure-data-is-loaded for 
sounds like a similar situation.)

Now, all that said, I'll look at whether this is a case where the code could 
have known to wait longer.  The bq command could be more strict about waiting 
until the job has either succeeded or failed, so you would never have to check. 
 The flip side is that to be really 100%, it would have to be willing to wait 
for 24+ hours.  I think people might find that surprising as the default 
behavior, but will consider it.

Original comment by e...@google.com on 19 Nov 2014 at 6:11

GoogleCodeExporter commented 9 years ago

Thanks for your answer, and sorry for not discovering stackoverflow question. 

Yes, I have changed to async load + regular status check and it seems to work 
now.
What I expect from (probably misread) bq documentation and my previous command 
line tools experience is:

1. "bq load" and "bq --nosync load; bq wait" is exactly the same thing. So, if 
former does not work for me, later will not either. 

2. Every synchronous command (like "bq load" without "nosync") makes its best 
to wait for the operation to complete. Including making reasonable retries when 
it fails to fetch some data due to temporary network/backend problems. (Yes, I 
am ok with this kind tools working for 24+ hours).

3. When there are clearly distinguishable error classes (like "we know load 
operation failed" and "we failed, know nothing about load operation") - it is 
possible to get this information from command's exit code. And from its output 
also.

Probably it worth either fixing the code or the documentation (by highlighting 
the way it works now).

Original comment by victor.g...@gmail.com on 20 Nov 2014 at 3:26

GoogleCodeExporter commented 9 years ago

Oh, no problem, an issue report is a perfectly fine place to file this too.

Yep, everything you say looks valid to me.  What I meant with doing the "bq 
wait" yourself is that then you have control over a top-level retry loop.  But 
granted, it's not obvious why you *should* have to add your own retry handling 
here.  I'll see if I can catch what would be escaping here.

Original comment by e...@google.com on 20 Nov 2014 at 6:31

GoogleCodeExporter commented 9 years ago

Thank you for your patience.  It was not retrying certain conditions that it 
should.  This issue should be fixed in the current version of bq from the Cloud 
SDK.

Original comment by e...@google.com on 27 Feb 2015 at 6:36

GoogleCodeExporter commented 9 years ago

Original comment by e...@google.com on 27 Feb 2015 at 6:37

Changed state: Fixed

sijocherian / google-bigquery

'bq load' fails, load job succeeds #179