Open GoogleCodeExporter opened 8 years ago
I couldn't reproduce the problem. I tried running the following query:
[5k spaces] select 17 [5k spaces]
And it worked fine. Can you be more specific about how to trigger the error?
Original comment by jcon...@google.com
on 23 Sep 2015 at 11:29
Issue 328 has been merged into this issue.
Original comment by jcon...@google.com
on 23 Sep 2015 at 11:29
Please see this stack overflow item, which is the original report of the issue:
http://stackoverflow.com/questions/32632489/bigquery-api-intermittently-returns-
http-error-400-bad-request/32634425#32634425
The pathname to a job that shows this behavior is:
"pathname":"/bigquery/v2/projects/rising-ocean-426/queries/job_aSR9OCO4U_P51gYZ2
xdRb145YEA"
Original comment by gquino...@brightcove.com
on 24 Sep 2015 at 9:30
The text of the query is:
SELECT
time_stamp,
time,
platform,
platform_version,
SUM(impressions) AS impressions,
SUM(views) AS views,
SUM(errors) AS errors,
SUM(seconds_viewed) AS seconds_viewed,
SUM(live_seconds_viewed) AS live_seconds_viewed,
SUM(ad_errors) AS ad_errors,
SUM(ad_pod_starts) AS ad_pod_starts,
SUM(ad_pod_ends) AS ad_pod_ends,
COUNT(1) AS sessions_count,
SUM(CASE WHEN impressions > 0 THEN 1 ELSE 0 END) AS sessions_with_impressions,
SUM(CASE WHEN views > 0 THEN 1 ELSE 0 END) AS sessions_with_views,
SUM(CASE WHEN errors > 0 THEN 1 ELSE 0 END) AS sessions_with_errors,
SUM(CASE WHEN seconds_viewed > 0 THEN 1 ELSE 0 END) AS sessions_with_seconds_viewed,
SUM(CASE WHEN live_seconds_viewed > 0 THEN 1 ELSE 0 END) AS sessions_with_live_seconds_viewed,
SUM(CASE WHEN ad_errors > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_errors,
SUM(CASE WHEN ad_pod_starts > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_pod_starts,
SUM(CASE WHEN ad_pod_ends > 0 THEN 1 ELSE 0 END) AS sessions_with_ad_pod_ends
FROM
(SELECT
time_stamp,
time,
session,
platform,
platform_version,
SUM(impressions) AS impressions,
SUM(views) AS views,
SUM(errors) AS errors,
SUM(seconds_viewed) AS seconds_viewed,
SUM(live_seconds_viewed) AS live_seconds_viewed,
SUM(ad_errors) AS ad_errors,
SUM(ad_pod_starts) AS ad_pod_starts,
SUM(ad_pod_ends) AS ad_pod_ends
FROM
(SELECT
STRFTIME_UTC_USEC(time*1000,'%Y-%m-%d %H:%M') AS time_stamp,
time - (time % 60000) AS time,
session,
CASE
WHEN platform = 'as3' THEN 'smart-player-flash'
WHEN platform = 'html5' THEN 'smart-player-html'
WHEN platform = 'html5-desktop' THEN 'smart-player-html-desktop'
WHEN platform = 'video-js' THEN 'video-js'
WHEN platform = 'video-js-4' THEN 'custom-video-js'
WHEN platform = 'custom' THEN 'custom-player'
WHEN platform = 'android-native-sdk' THEN 'android-native-sdk'
WHEN platform = 'ios-native-sdk' THEN 'ios-native-sdk'
ELSE 'undefined' END AS platform,
REGEXP_EXTRACT(CONCAT(CASE WHEN platform_version IS NULL THEN '_' ELSE platform_version END,
'._._._._'),
r'^([^.]+(?:\.[^.]+){3})')
AS platform_version,
CASE WHEN event = 'video_impression' THEN 1 ELSE 0 END AS impressions,
CASE WHEN event = 'video_view' THEN 1 ELSE 0 END AS views,
CASE WHEN event IN ('error', 'media_connect_error', '/viewer/media_error') THEN 1 ELSE 0 END AS errors,
CASE WHEN event = 'video_engagement' AND range IS NOT NULL
AND INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) < 20
AND INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) >= 0
THEN INTEGER(REGEXP_EXTRACT(range, r'(\d+)$')) - INTEGER(REGEXP_EXTRACT(range, r'^(\d+)')) + 1
ELSE 0
END AS seconds_viewed,
CASE WHEN event = 'video_engagement'
AND range IS NULL
AND INTEGER(video_seconds_viewed) > 0
THEN 10
ELSE 0 END AS live_seconds_viewed,
CASE WHEN event = 'qos.events.ad.error' THEN 1 ELSE 0 END as ad_errors,
CASE WHEN event = 'ad_start' THEN 1 ELSE 0 END as ad_pod_starts,
CASE WHEN event = 'ad_end' THEN 1 ELSE 0 END as ad_pod_ends
FROM [metrics_bucket.metrics_2015_09_17@1442494017000-1442498172536]
WHERE event IN ('ad_start',
'ad_end',
'qos.events.ad.error',
'video_view',
'video_impression',
'error',
'media_connect_error',
'/viewer/media_error',
'video_engagement')
)
GROUP BY time, time_stamp, session, platform, platform_version)
GROUP BY time, time_stamp, platform, platform_version
HAVING time_stamp >= STRFTIME_UTC_USEC(now()- (90*60*1000*1000) ,'%Y-%m-%d %H:%M')
ORDER BY time_stamp
Original comment by gquino...@brightcove.com
on 24 Sep 2015 at 9:36
I am seeing something curious: The response that I received from the query,
which is recorded in my logs, does not match the response that I see when I
examine the Bigquery job using Google's restful api. For example:
GET
https://www.googleapis.com/bigquery/v2/projects/rising-ocean-426/jobs/job_4yHjBL
ABpaUY_XkC6d7YTlscV6Y
Returns a response that indicates the query succeeded. Yet, when I examine my
logs I see a response that the query failed wit the aforementioned bad request
message. In my logs, the response says (in part):
error: Thu, 17 Sep 2015 16:20:12 GMT Error encountered for bigquery
totalsByPlatformVersion:
{"errors":[],"code":400,"message":"<HTML>\n<HEAD>\n<TITLE>Bad
Request</TITLE>\n</HEAD>\n<BODY BGCOLOR=\"#FFFFFF\" TEXT=\"#000000\">\n<H1>Bad
Request</H1>\n<H2>Error
400</H2>\n</BODY>\n</HTML>\n","response":{"statusCode":400,"body":"<HTML>\n<HEAD
>\n<TITLE>Bad Request</TITLE>\n</HEAD>\n<BODY BGCOLOR=\"#FFFFFF\"
TEXT=\"#000000\">\n<H1>Bad Request</H1>\n<H2>Error
400</H2>\n</BODY>\n</HTML>\n","headers":{"date":"Thu, 17 Sep 2015 16:20:12
GMT","server":"GFE/2.0","content-type":"text/html;
charset=UTF-8","alternate-protocol":"443:quic,p=1","alt-svc":"quic=\":443\";
p=\"1\";
ma=604800","transfer-encoding":"chunked"},"request":{"uri":{"protocol":"https:",
"slashes":true,"auth":null,"host":"www.googleapis.com","port":null,"hostname":"w
ww.googleapis.com","hash":null,"search":"?query=....
I can post the entire response if you wish, but it is large and url encoded.
Original comment by gquino...@brightcove.com
on 24 Sep 2015 at 11:03
Yeah, if you're willing, you could post the whole thing as an attachment.
I'm guessing that we're creating the query job, and somewhere before the
response gets back to you, we hit an error.
One "best practice" that you might consider following is to generate job IDs on
your side, rather than allowing BigQuery to generate them. Then you can retry
on the same job ID when you hit an error, and you'll be assured that the job
will be created at most once.
Original comment by jcon...@google.com
on 24 Sep 2015 at 5:21
Original comment by gquino...@brightcove.com
on 24 Sep 2015 at 6:38
The above attachment is the complete error response as you requested.
One concern I have over this is that we're getting billed for these queries
since, from the internal perspective, the query completed successfully even
though we received an error response and no data was delivered.
Although your suggestion of generating job ids on our side would take care of
(eventually) receiving the correct data, it does not address the issue of
double billing.
Original comment by gquino...@brightcove.com
on 25 Sep 2015 at 2:15
It looks like you're adding the query text to the URL as a query parameter.
Instead, I'd recommend putting it only in the request body. The request
parameter is likely ignored, and the long URL might be contributing to the "bad
request" error.
If you generate your own job IDs, it'll handle the issue of receiving the
correct data *and* double billing. Since you're only allowed to use a given job
ID once, you can retry as many times as you want with that job ID, and you'll
be guaranteed that the query will run at most once (and bill you at most once).
Also, I noticed that the attachment you posted contained an OAuth bearer token,
which can be sent in other requests to authenticate you. I deleted the
attachment. To be on the safe side, you might consider revoking any
BigQuery-related tokens here:
https://security.google.com/settings/security/permissions
You can then reissue new ones by running the corresponding client tools again.
Original comment by jcon...@google.com
on 25 Sep 2015 at 4:34
Hi,
The query was generated by Google's node.js package gcloud 1.4.28, so this
looks like a bug in that library then.
The query was constructed as follows:
...
var original_query = formattedQuery;
if (queryTiemeout){
formattedQuery = {
query : formattedQuery,
timeoutMs : queryTiemeout
};
}
gcloud.bigquery()
.query(formattedQuery)
...
Thanks for deleting the attachment. I realized there was a risk in posting but
did not want to alter the response in any way that might affect the analysis.
There should be some way to send sensitive issue related information to Google
without having to take these risks.
Original comment by gquino...@brightcove.com
on 25 Sep 2015 at 4:45
Gotcha. I'll follow up with them.
An alternative to the attachment is to email me directly (jcondit@google.com).
I'll suggest that in the future.
Original comment by jcon...@google.com
on 25 Sep 2015 at 4:53
Original comment by thomasp...@google.com
on 24 Nov 2015 at 7:13
Original issue reported on code.google.com by
gquino...@brightcove.com
on 19 Sep 2015 at 2:52