runt18 / google-bigquery

Automatically exported from code.google.com/p/google-bigquery
0 stars 0 forks source link

error response from Bigquery to very long queries is not sufficiently clear #327

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Submit a bigquery through node js or any other api. Pad the query with lots 
and lots of white space so you don't have to construct a complex query. Need 
query string length > 4k

What is the expected output? What do you see instead?

I expect an error response stating that the the query is longer than the length 
limit of XXX.

Instead what I see is an http 400 response simply stating: "Bad Request" 

What version of the product are you using? On what operating system?

gcloud 1.4.28

Please provide any additional information below.

Please not that the query is failing when the length exceeds 4k, even though 
the query length limit should be 256k.

Original issue reported on code.google.com by gquino...@brightcove.com on 19 Sep 2015 at 2:52

GoogleCodeExporter commented 8 years ago
I couldn't reproduce the problem. I tried running the following query:

  [5k spaces] select 17 [5k spaces]

And it worked fine. Can you be more specific about how to trigger the error?

Original comment by jcon...@google.com on 23 Sep 2015 at 11:29

GoogleCodeExporter commented 8 years ago
Issue 328 has been merged into this issue.

Original comment by jcon...@google.com on 23 Sep 2015 at 11:29

GoogleCodeExporter commented 8 years ago
Please see this stack overflow item, which is the original report of the issue:

http://stackoverflow.com/questions/32632489/bigquery-api-intermittently-returns-
http-error-400-bad-request/32634425#32634425

The pathname to a job that shows this behavior is:

"pathname":"/bigquery/v2/projects/rising-ocean-426/queries/job_aSR9OCO4U_P51gYZ2
xdRb145YEA"

Original comment by gquino...@brightcove.com on 24 Sep 2015 at 9:30

GoogleCodeExporter commented 8 years ago
The text of the query is:

      SELECT 
        time_stamp, 
        time, 
        platform, 
        platform_version, 
        SUM(impressions) AS impressions, 
        SUM(views) AS views, 
        SUM(errors) AS errors, 
        SUM(seconds_viewed) AS seconds_viewed, 
        SUM(live_seconds_viewed) AS live_seconds_viewed, 
        SUM(ad_errors) AS ad_errors, 
        SUM(ad_pod_starts) AS ad_pod_starts, 
        SUM(ad_pod_ends) AS ad_pod_ends, 
        COUNT(1) AS sessions_count, 
        SUM(CASE WHEN impressions > 0 THEN 1 ELSE 0 END) AS sessions_with_impressions, 
        SUM(CASE WHEN views > 0 THEN 1 ELSE 0 END) AS sessions_with_views, 
        SUM(CASE WHEN errors > 0 THEN 1 ELSE 0 END) AS sessions_with_errors, 
        SUM(CASE WHEN seconds_viewed > 0 THEN 1 ELSE 0 END) AS sessions_with_seconds_viewed, 
        SUM(CASE WHEN live_seconds_viewed > 0 THEN 1 ELSE 0 END) AS sessions_with_live_seconds_viewed, 
        SUM(CASE WHEN ad_errors > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_errors, 
        SUM(CASE WHEN ad_pod_starts > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_pod_starts, 
        SUM(CASE WHEN ad_pod_ends > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_pod_ends 
      FROM 
      (SELECT 
        time_stamp, 
        time, 
        session, 
        platform, 
        platform_version, 
        SUM(impressions) AS impressions, 
        SUM(views) AS views, 
        SUM(errors) AS errors, 
        SUM(seconds_viewed) AS seconds_viewed, 
        SUM(live_seconds_viewed) AS live_seconds_viewed, 
        SUM(ad_errors) AS ad_errors, 
        SUM(ad_pod_starts) AS ad_pod_starts, 
        SUM(ad_pod_ends) AS ad_pod_ends 
      FROM 
         (SELECT 
            STRFTIME_UTC_USEC(time*1000,'%Y-%m-%d %H:%M') AS time_stamp, 
            time - (time % 60000) AS time, 
            session, 
            CASE 
      WHEN platform = 'as3'                THEN 'smart-player-flash' 
      WHEN platform = 'html5'              THEN 'smart-player-html' 
      WHEN platform = 'html5-desktop'      THEN 'smart-player-html-desktop' 
      WHEN platform = 'video-js'           THEN 'video-js' 
      WHEN platform = 'video-js-4'         THEN 'custom-video-js' 
      WHEN platform = 'custom'             THEN 'custom-player' 
      WHEN platform = 'android-native-sdk' THEN 'android-native-sdk' 
      WHEN platform = 'ios-native-sdk'     THEN 'ios-native-sdk' 
      ELSE 'undefined' END AS platform, 
            REGEXP_EXTRACT(CONCAT(CASE WHEN platform_version IS NULL THEN '_' ELSE platform_version END, 
                                  '._._._._'), 
                           r'^([^.]+(?:\.[^.]+){3})') 
              AS platform_version, 
            CASE WHEN event = 'video_impression' THEN 1 ELSE 0 END AS impressions, 
            CASE WHEN event = 'video_view' THEN 1 ELSE 0 END AS views, 
            CASE WHEN event IN ('error', 'media_connect_error', '/viewer/media_error') THEN 1 ELSE 0 END AS errors, 
            CASE WHEN event = 'video_engagement' AND range IS NOT NULL 
                 AND  INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) < 20 
                 AND  INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) >= 0 
                 THEN INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) + 1 
                 ELSE 0 
                 END AS seconds_viewed, 
            CASE WHEN event = 'video_engagement' 
                  AND range IS NULL 
                  AND INTEGER(video_seconds_viewed) > 0 
                 THEN 10 
                 ELSE 0 END AS live_seconds_viewed, 
            CASE WHEN event = 'qos.events.ad.error' THEN 1 ELSE 0 END as ad_errors, 
            CASE WHEN event = 'ad_start' THEN 1 ELSE 0 END as ad_pod_starts, 
            CASE WHEN event = 'ad_end' THEN 1 ELSE 0 END as ad_pod_ends 
          FROM [metrics_bucket.metrics_2015_09_17@1442494017000-1442498172536] 
          WHERE event IN ('ad_start',
                          'ad_end', 
                          'qos.events.ad.error', 
                          'video_view', 
                          'video_impression', 
                          'error', 
                          'media_connect_error', 
                          '/viewer/media_error', 
                          'video_engagement') 

         )
      GROUP  BY time, time_stamp, session, platform, platform_version) 
      GROUP  BY time, time_stamp, platform, platform_version 
      HAVING time_stamp >= STRFTIME_UTC_USEC(now()- (90*60*1000*1000) ,'%Y-%m-%d %H:%M') 
      ORDER BY time_stamp

Original comment by gquino...@brightcove.com on 24 Sep 2015 at 9:36

GoogleCodeExporter commented 8 years ago
I am seeing something curious: The response that I received from the query, 
which is recorded in my logs, does not match the response that I see when I 
examine the Bigquery job using Google's  restful api. For example:

GET 
https://www.googleapis.com/bigquery/v2/projects/rising-ocean-426/jobs/job_4yHjBL
ABpaUY_XkC6d7YTlscV6Y

Returns a response that indicates the query succeeded. Yet, when I examine my 
logs I see a response that the query failed wit the aforementioned bad request 
message. In my logs, the response says (in part):

error: Thu, 17 Sep 2015 16:20:12 GMT Error encountered for bigquery 
totalsByPlatformVersion: 
{"errors":[],"code":400,"message":"<HTML>\n<HEAD>\n<TITLE>Bad 
Request</TITLE>\n</HEAD>\n<BODY BGCOLOR=\"#FFFFFF\" TEXT=\"#000000\">\n<H1>Bad 
Request</H1>\n<H2>Error 
400</H2>\n</BODY>\n</HTML>\n","response":{"statusCode":400,"body":"<HTML>\n<HEAD
>\n<TITLE>Bad Request</TITLE>\n</HEAD>\n<BODY BGCOLOR=\"#FFFFFF\" 
TEXT=\"#000000\">\n<H1>Bad Request</H1>\n<H2>Error 
400</H2>\n</BODY>\n</HTML>\n","headers":{"date":"Thu, 17 Sep 2015 16:20:12 
GMT","server":"GFE/2.0","content-type":"text/html; 
charset=UTF-8","alternate-protocol":"443:quic,p=1","alt-svc":"quic=\":443\"; 
p=\"1\"; 
ma=604800","transfer-encoding":"chunked"},"request":{"uri":{"protocol":"https:",
"slashes":true,"auth":null,"host":"www.googleapis.com","port":null,"hostname":"w
ww.googleapis.com","hash":null,"search":"?query=....

I can post the entire response if you wish, but it is large and url encoded.

Original comment by gquino...@brightcove.com on 24 Sep 2015 at 11:03

GoogleCodeExporter commented 8 years ago
Yeah, if you're willing, you could post the whole thing as an attachment.

I'm guessing that we're creating the query job, and somewhere before the 
response gets back to you, we hit an error.

One "best practice" that you might consider following is to generate job IDs on 
your side, rather than allowing BigQuery to generate them. Then you can retry 
on the same job ID when you hit an error, and you'll be assured that the job 
will be created at most once.

Original comment by jcon...@google.com on 24 Sep 2015 at 5:21

GoogleCodeExporter commented 8 years ago

Original comment by gquino...@brightcove.com on 24 Sep 2015 at 6:38

GoogleCodeExporter commented 8 years ago
The above attachment is the complete error response as you requested.

One concern I have over this is that we're getting billed for these queries 
since, from the internal perspective, the query completed successfully even 
though we received an error response and no data was delivered.

Although your suggestion of generating job ids on our side would take care of 
(eventually) receiving the correct data, it does not address the issue of 
double billing.

Original comment by gquino...@brightcove.com on 25 Sep 2015 at 2:15

GoogleCodeExporter commented 8 years ago
It looks like you're adding the query text to the URL as a query parameter. 
Instead, I'd recommend putting it only in the request body. The request 
parameter is likely ignored, and the long URL might be contributing to the "bad 
request" error.

If you generate your own job IDs, it'll handle the issue of receiving the 
correct data *and* double billing. Since you're only allowed to use a given job 
ID once, you can retry as many times as you want with that job ID, and you'll 
be guaranteed that the query will run at most once (and bill you at most once).

Also, I noticed that the attachment you posted contained an OAuth bearer token, 
which can be sent in other requests to authenticate you. I deleted the 
attachment. To be on the safe side, you might consider revoking any 
BigQuery-related tokens here:

https://security.google.com/settings/security/permissions

You can then reissue new ones by running the corresponding client tools again.

Original comment by jcon...@google.com on 25 Sep 2015 at 4:34

GoogleCodeExporter commented 8 years ago
Hi,

The query was generated by Google's node.js package gcloud 1.4.28, so this 
looks like a bug in that library then.

The query was constructed as follows:

...
      var original_query = formattedQuery;

      if (queryTiemeout){
        formattedQuery = {
          query : formattedQuery,
          timeoutMs : queryTiemeout
        };
      }

      gcloud.bigquery()
      .query(formattedQuery)
...

Thanks for deleting the attachment. I realized there was a risk in posting but 
did not want to alter the response in any way that might affect the analysis. 
There should be some way to send sensitive issue related information to Google 
without having to take these risks.

Original comment by gquino...@brightcove.com on 25 Sep 2015 at 4:45

GoogleCodeExporter commented 8 years ago
Gotcha. I'll follow up with them.

An alternative to the attachment is to email me directly (jcondit@google.com). 
I'll suggest that in the future.

Original comment by jcon...@google.com on 25 Sep 2015 at 4:53

GoogleCodeExporter commented 8 years ago

Original comment by thomasp...@google.com on 24 Nov 2015 at 7:13