Closed GoogleCodeExporter closed 9 years ago
If it is relate to load, we should be able to repo on the Canary by just
running that script a couple times (maybe adding a sleep to the script).
Original comment by csharp@chromium.org
on 14 Mar 2014 at 3:19
I tried reproducing the problem on the Canary server, so it is much less
disruptive. I added a --repeat flag to run_on_bots.py so I can generate 10x the
load.
./tools/run_on_bots.py --swarming https://chromium-swarm-dev.appspot.com \
--isolate-server https://isolateserver-dev.appspot.com --priority 5 \
--repeat 10 fine.py
Sadly, I was not able to reproduce the AbortRunner() failure, but got a fair
number of HTTP 503 and this one a few times:
--- CUT HERE ---
File "/base/data/home/apps/s~chromium-swarm-dev/560-0d2f4af.374373064579888318/components/auth/handler.py", line 100, in dispatch
identity = method_func(self.request)
File "/base/data/home/apps/s~chromium-swarm-dev/560-0d2f4af.374373064579888318/components/auth/handler.py", line 268, in oauth_authentication
client_id = oauth.get_client_id(oauth_scope)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/oauth/oauth_api.py", line 165, in get_client_id
_maybe_call_get_oauth_user(_scope)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/oauth/oauth_api.py", line 215, in _maybe_call_get_oauth_user
apiproxy_stub_map.MakeSyncCall('user', 'GetOAuthUser', req, resp)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 328, in MakeSyncCall
rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
raise self.exception
DeadlineExceededError: The API call user.GetOAuthUser() took too long to
respond and was cancelled.
--- CUT HERE ---
Original comment by maruel@chromium.org
on 14 Mar 2014 at 6:13
I'm sad that GetOAuthUser API is at same level of reliability as other GAE APIs
:(
Though I think it might not be a problem in a long term: bots we'll be using
our own auth implementation (IP whitelist for now), that may be more reliable.
Original comment by vadimsh@chromium.org
on 14 Mar 2014 at 6:18
Issue chromium:354263 has been merged into this issue.
Original comment by maruel@chromium.org
on 20 Mar 2014 at 12:13
Seems like it's related to the missing index I fixed in
fa07bcbb4c1a7f02f847422c70a21e3961a9bb35. I had deployed it to the canary
server but not the prod yet. I just deployed it a few minutes ago, will monitor
the ereporter2 report in the next hour (which has, btw, been doing error
reports hourly for a while now)
Original comment by maruel@chromium.org
on 20 Mar 2014 at 1:24
Disabled the cron job on the prod server while I'm debugging the problem in
redaf98aa0ed876469b5ddef20b421c05b4c9e51e. The problem should not be visible on
the chromium try server starting now.
Original comment by maruel@chromium.org
on 20 Mar 2014 at 3:22
Issue 93 has been merged into this issue.
Original comment by maruel@chromium.org
on 8 Apr 2014 at 4:54
Issue 52 has been merged into this issue.
Original comment by maruel@chromium.org
on 8 Apr 2014 at 7:35
It's mostly fixed but I'll run a few tests on the prod instance to confirm.
Original comment by maruel@chromium.org
on 29 May 2014 at 1:53
It's not perfect but works well with our current load (~350 bots) and was
tested with much higher load test.
Original comment by maruel@chromium.org
on 5 Jun 2014 at 4:09
Original issue reported on code.google.com by
maruel@chromium.org
on 14 Mar 2014 at 3:11