Open GoogleCodeExporter opened 9 years ago
Looks like it gets logged out during upload. Note that this has been running
for 2 hours, which is probably not a use case that had been tested yet.
Updating description accordingly.
Original comment by maruel@chromium.org
on 18 Mar 2014 at 12:38
it is unusual for this run to take 2 hrs, but the normal isn't much lower than
that
i can tweak the logic to try & recreate the connection from scratch a few times
on our side
Original comment by vapier@chromium.org
on 18 Mar 2014 at 1:57
hmm, would i go about forcing a refresh ? i tried the obvious -- just calling
storage.get_server_api() again:
https://chromium-review.googlesource.com/#/c/190076/1/scripts/upload_symbols.py
but that didn't work. i get the same error back:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/atom-release-group/builds/5/steps/DebugSymbols%20%5Bx86-mario%5D/logs/stdio
just search for "Authentication is required".
Original comment by vapier@chromium.org
on 18 Mar 2014 at 9:37
Ah... it's not only XSRF token that can expire. There's another expirable
signature in upload URL. Basically when you call storage.contains(...) it
generates signed URL that expires after ~1h (this URL is stored internally in
Item instances). Calling storage.get_server_api() will force XSRF token
refresh, but wouldn't do anything to upload URL signature.
I think the easiest fix is to change URL signature expiration time to something
like 6h on a server side.
You still would hit XSRF token expiration though. I'm not sure that extending
that expiration time is a good idea. Client should transparently refetch XSRF
token. It's not implemented yet since we never actually had this problem
before.
Original comment by vadimsh@chromium.org
on 18 Mar 2014 at 10:47
adding retries on the client side for the server communication sounds fine. if
isolateserver did it itself, that'd be awesome, but i don't mind implementing
it myself (as i already did as can be seen in that CL).
increasing the expiration of the URL signature would help a lot as that isn't
nearly as easy to regenerate. 6h would be plenty of time for us.
i double checked our history, and it does seem that 2hrs is extremely unusual.
normally it's like half that. i guess i picked a bad time to try and deploy
this :).
Original comment by vapier@chromium.org
on 18 Mar 2014 at 10:51
Sorry, upload URL signature expiration is not an issue. It's set to 4 hours
already. XSRF token expires after 2 hours though.
I think you need to recreate |storage_query| instance of Storage in
UploadSymbols outter retry loop. Log says "Authentication is required for
https://isolateserver.appspot.com/content-gs/pre-upload/". /pre-upload is used
by 'contains(...)' method. contains(...) is used in SymbolDeduplicator(...) and
it gets its instance of Storage from UploadSymbols function.
Also creating new instance of Storage in SymbolDeduplicatorNotify all the time
is not effective. It will always do two HTTP requests (one to get XSRF token,
another to actually push the data).
All problems can be solved by smart XSRF token refetch in Storage. But I don't
have cycles to implement it right now :(
Original comment by vadimsh@chromium.org
on 18 Mar 2014 at 11:05
[deleted comment]
so the code might be a little confusing because there's actually two different
long lived storage connections. there's the storage_query that is used only to
call .contains(), and that is initialized once in UploadSymbols and then used
in SymbolDeduplicator. expiration there is not a problem because we use that
in the symbol finder and that runs fast (it is used in the first ~5 minutes of
runtime). it may also get used at the very end in the retry code path, but i'm
not worried about that case right now because we rarely retry and when we do,
the count is low (<10 symbols).
the other one is fully contained in SymbolDeduplicatorNotify and that gets used
after every successful symbol upload. it's a parallel process ... the func
doesn't get called more than once (see storage_notify_proc). that's why we
pass in the namespace attribute and the func takes care of calling
storage_get_api() itself. so that is the part i want to workaround now as that
can impact 1000+ symbols.
note: the log contains two sets of auth errors. at the very end is the
.contains() which comes from the retry which i don't care about now, but in the
middle is the .push() (which can be seen in the traceback too).
Original comment by vapier@chromium.org
on 18 Mar 2014 at 11:27
I didn't notice first error.
So the root cause is explained in CL below. I think we can deploy a workaround
until I come up with a proper fix: https://codereview.appspot.com/78060043.
Original comment by vadimsh@chromium.org
on 20 Mar 2014 at 2:15
Original issue reported on code.google.com by
vapier@chromium.org
on 18 Mar 2014 at 5:26