googledatalab / pydatalab

Google Datalab Library
Apache License 2.0
194 stars 79 forks source link

Datalab BigQuery examples fail due to library not specifying job location #732

Open m0ar opened 4 years ago

m0ar commented 4 years ago

We are trying to use a Datalab instance, but even the simple examples fail as the library can't find the results of our queries. The actual BQ job does get executed (which we have manually verified), but the result can't be fetched back to datalab, which errors out:

HTTP request failed: Not found: Job acme-267909:job_qMlsf2_2kOPHKAIBrZlMLl9vKtpZ

I have faced the exact same issue when using the Elixir API, and the solution was to add the optional argument location: "europe-north1". This exact issue is at play in the datalab bigquery library, since the location is not passed with the args here:

https://github.com/googledatalab/pydatalab/blob/1d6865237fdc8d184123d1e89193578da56d73b3/datalab/bigquery/_api.py#L237

We have verified this is the problem, as this little snippet successfully gets the status of the job:

import google.datalab.utils as utils
import datalab.context as context

url = "https://bigquery.googleapis.com/bigquery/v2/projects/acme-267909/queries/job_qMlsf2_2kOPHKAIBrZlMLl9vKtpZ"
credentials = context.Context.default().credentials

args = {
    "location": "europe-north1"
}
utils.Http.request(url, args=args, credentials=credentials)

Worth mentioning is also that the docs lists location explicitly as optional for EU and US, which then is obviously not true. This has confused us before.

m0ar commented 4 years ago

Please reach out if you need any more information to fix this :pray:

m0ar commented 4 years ago

A suggested fix would be to let google.datalab.bigquery.Query.__init__ also take an optional argument location, set this as an instance attribute, and use that in the args map for this, and possibly other relevant calls.