madecoste / swarming

Automatically exported from code.google.com/p/swarming
Apache License 2.0
0 stars 1 forks source link

net.py HttpService doesn't renegociate xsrf token after expiration #91

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
upload log is here:
https://uberchromegw.corp.google.com/i/chromeos/builders/daisy%20canary/builds/2
551/steps/DebugSymbols/logs/stdio

but the key points:
 - the system starts off fine:
20:19:54: INFO: Checking existence of 100 files...
20:19:54: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com
20:19:55: INFO: Protocol version: 1.0
20:19:55: INFO: Server version: 562-61d9774
20:19:59: INFO: Queried 100 files, 3 cache hit
20:20:00: INFO: Checking existence of 100 files...
20:20:00: INFO: Queried 100 files, 5 cache hit
20:20:00: INFO: Checking existence of 100 files...
20:20:02: INFO: Queried 100 files, 6 cache hit
20:20:03: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com
20:20:03: INFO: Checking existence of 100 files...
...

 - every line like below means the system has posted that file to the isolateserver (btw, the timing info is relative to the crash server, not isolate, so don't be surprised by those):
20:21:20: INFO: upload of       5210 bytes took 0:00:15.530764: libxt_tcp.so.sym

 - after a while, this shows up:
22:19:54: INFO: upload of      19506 bytes took 0:00:00.536612: setresuid01.sym
22:19:55: WARNING: Authentication is required for 
https://isolateserver.appspot.com/content-gs/store/chromium-os-upload-symbols/f0
927155e9b4487d844a7686eb13325c2f0178c8?... on attempt 0.
403 Client Error: Forbidden
Cookie authentication requires interactive login
22:19:55: ERROR: Unable to authenticate to https://isolateserver.appspot.com 
(403 Client Error: Forbidden). Use auth.py to login: python auth.py login 
--service=https://isolateserver.appspot.com
22:19:55: WARNING: posting fc-match.sym to dedupe server failed
Traceback (most recent call last):
  File "/b/cbuild/daisy-canary-master/chromite/scripts/upload_symbols.py", line 370, in SymbolDeduplicatorNotify
    storage.push(item, item.content(0))
  File "/b/cbuild/daisy-canary-master/chromite/third_party/swarming.client/isolateserver.py", line 995, in push
    item.digest, item.push_state.upload_url))
IOError: Failed to upload a file f0927155e9b4487d844a7686eb13325c2f0178c8 to 
https://isolateserver.appspot.com/content-gs/store/chromium-os-upload-symbols/f0
927155e9b4487d844a7686eb13325c2f0178c8?...
22:19:55: INFO: upload of       2248 bytes took 0:00:00.674036: fc-match.sym

Original issue reported on code.google.com by vapier@chromium.org on 18 Mar 2014 at 5:26

GoogleCodeExporter commented 9 years ago
Looks like it gets logged out during upload. Note that this has been running 
for 2 hours, which is probably not a use case that had been tested yet. 
Updating description accordingly.

Original comment by maruel@chromium.org on 18 Mar 2014 at 12:38

GoogleCodeExporter commented 9 years ago
it is unusual for this run to take 2 hrs, but the normal isn't much lower than 
that

i can tweak the logic to try & recreate the connection from scratch a few times 
on our side

Original comment by vapier@chromium.org on 18 Mar 2014 at 1:57

GoogleCodeExporter commented 9 years ago
hmm, would i go about forcing a refresh ?  i tried the obvious -- just calling 
storage.get_server_api() again:
  https://chromium-review.googlesource.com/#/c/190076/1/scripts/upload_symbols.py

but that didn't work.  i get the same error back:
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/atom-release-group/builds/5/steps/DebugSymbols%20%5Bx86-mario%5D/logs/stdio

just search for "Authentication is required".

Original comment by vapier@chromium.org on 18 Mar 2014 at 9:37

GoogleCodeExporter commented 9 years ago
Ah... it's not only XSRF token that can expire. There's another expirable 
signature in upload URL. Basically when you call storage.contains(...) it 
generates signed URL that expires after ~1h (this URL is stored internally in 
Item instances). Calling storage.get_server_api() will force XSRF token 
refresh, but wouldn't do anything to upload URL signature.

I think the easiest fix is to change URL signature expiration time to something 
like 6h on a server side.

You still would hit XSRF token expiration though. I'm not sure that extending 
that expiration time is a good idea. Client should transparently refetch XSRF 
token. It's not implemented yet since we never actually had this problem 
before. 

Original comment by vadimsh@chromium.org on 18 Mar 2014 at 10:47

GoogleCodeExporter commented 9 years ago
adding retries on the client side for the server communication sounds fine.  if 
isolateserver did it itself, that'd be awesome, but i don't mind implementing 
it myself (as i already did as can be seen in that CL).

increasing the expiration of the URL signature would help a lot as that isn't 
nearly as easy to regenerate.  6h would be plenty of time for us.

i double checked our history, and it does seem that 2hrs is extremely unusual.  
normally it's like half that.  i guess i picked a bad time to try and deploy 
this :).

Original comment by vapier@chromium.org on 18 Mar 2014 at 10:51

GoogleCodeExporter commented 9 years ago
Sorry, upload URL signature expiration is not an issue. It's set to 4 hours 
already. XSRF token expires after 2 hours though. 

I think you need to recreate |storage_query| instance of Storage in 
UploadSymbols outter retry loop. Log says "Authentication is required for 
https://isolateserver.appspot.com/content-gs/pre-upload/". /pre-upload is used 
by 'contains(...)' method. contains(...) is used in SymbolDeduplicator(...) and 
it gets its instance of Storage from UploadSymbols function. 

Also creating new instance of Storage in SymbolDeduplicatorNotify all the time 
is not effective. It will always do two HTTP requests (one to get XSRF token, 
another to actually push the data).

All problems can be solved by smart XSRF token refetch in Storage. But I don't 
have cycles to implement it right now :(

Original comment by vadimsh@chromium.org on 18 Mar 2014 at 11:05

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
so the code might be a little confusing because there's actually two different 
long lived storage connections.  there's the storage_query that is used only to 
call .contains(), and that is initialized once in UploadSymbols and then used 
in SymbolDeduplicator.  expiration there is not a problem because we use that 
in the symbol finder and that runs fast (it is used in the first ~5 minutes of 
runtime).  it may also get used at the very end in the retry code path, but i'm 
not worried about that case right now because we rarely retry and when we do, 
the count is low (<10 symbols).

the other one is fully contained in SymbolDeduplicatorNotify and that gets used 
after every successful symbol upload.  it's a parallel process ... the func 
doesn't get called more than once (see storage_notify_proc).  that's why we 
pass in the namespace attribute and the func takes care of calling 
storage_get_api() itself.  so that is the part i want to workaround now as that 
can impact 1000+ symbols.

note: the log contains two sets of auth errors.  at the very end is the 
.contains() which comes from the retry which i don't care about now, but in the 
middle is the .push() (which can be seen in the traceback too).

Original comment by vapier@chromium.org on 18 Mar 2014 at 11:27

GoogleCodeExporter commented 9 years ago
I didn't notice first error. 

So the root cause is explained in CL below. I think we can deploy a workaround 
until I come up with a proper fix: https://codereview.appspot.com/78060043.

Original comment by vadimsh@chromium.org on 20 Mar 2014 at 2:15