Unable to get failure reason for jobs that have failed

jewer commented 9 years ago

Happy to write this myself, but just wanted to validate direction:

Use Case: I've submitted a query asynchronously to BQ. The query fails for any number of reasons. The job will be marked as "done" but I do not have the ability to get the reason for the failure and the current check_job implementation will just say 0 rows.

Solutions, in no particular order:

Leave it alone, it's good enough
Hijack check_job and have it return richer job status information.
Add a new get_job_status method that's responsible for returning the status, including failure reasons if any.

I'm guessing number 2 is probably the best option, but wanted to pass it by before making a pull request.

Whatdya think?

tylertreat commented 9 years ago

I'm in favor of 2 as well. It might make sense to have check_job return a "job status" object which encapsulates the various job properties: id, complete, total rows, failure reason, etc. It's a breaking API change obviously, but I think it's probably justified here.

jewer commented 9 years ago

One super annoying thing about the BQ API is that the status code returned for getQueryResults on failed jobs represents the reason why the job failed, not for the getQueryResults request.

For instance, if I try to submit a query against a table that doesn't exist, getQueryResults returns a 404. If I submit syntactically incorrect SQL to a batch job, then getQueryResults return a 400. Personally, I think that's an odd API design as I expect the status code to be in context of the current request, not the job I submitted and am now checking on the status. Anyways, not a big deal; it just means that the logic in check_job will have to catch HttpError and just involve a lot more logic than I think it should have to.

I'm guessing we probably want our own shape for that job status object, just to reduce the number of breaking changes? Probably model off the current structure? If it changes in the future, we can just continue to use the same shape or decide to make another breaking API change. Something like this?

{
  "job_id": "ABCD123",
  "state": "DONE", //done, pending, running
  "total_rows": 0, //valid values 0-n, will be missing if in error state, mutually exclusive with 'error'
  "error": {  //if errors, otherwise missing
    "message": "Encountered bad SQL at line 1, column 53. Was expecting: <EOF>", //top-level message as reported by BQ
    "errors": [
       {
         "reason": "invalidQuery",
         "message": "Encountered bad SQL at line 1, column 53. Was expecting: <EOF>"
       },
       {
         "reason": "invalidQuery",
         "message": "Encountered bad SQL at line 4, column 22. Was expecting: SELECT"
       }
    ]
  }
}

tylertreat commented 9 years ago

Yeah, that seems reasonable. I think I would be in favor of making that an actual class instead of using a dict. Otherwise the user has to know what keys are set as opposed to operating on a rigid object. This also makes it a little easier to change things without breaking the API down the road.

tylertreat / BigQuery-Python

Unable to get failure reason for jobs that have failed #39