pombreda / swarming

Automatically exported from code.google.com/p/swarming
Apache License 2.0
0 stars 0 forks source link

Reduce /pre-upload requests by caching entry presence information locally #58

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
vadimsh:
Currently when isolateserver.py isolates a test it asks server for presence 
information of all files being isolated (via /pre-upload call) each time.

If server can keep a promise that it doesn't delete a file until it expires, 
uploading side can cache expiration time locally and do not consult with the 
server all the time.

It should significantly speed up a case when almost exact same set of files are 
isolated repeatedly from the same machine. That's exactly what's happening on 
compiling buildbot bots.

Some complications and intended solutions:
1) Sometimes files get deleted before expiration time. It's mostly exceptional 
cases of corrupted uploads or manual interventions. Solution: /handshake 
returns a timestamp of when such situation happened last time. If uploading 
client notices this timestamp is different from what it cached from previous 
run, it nukes its entire presence cache. Better safe than sorry.
2) Expiration time precision is one day (entry.last_access is DateProperty). 
And exact time when cleanup cron runs is not known (and should not be relied 
upon anyway). Solution: client should be pessimistic about expiration time and 
a) remove presence information from cache one day before file expires, b) do 
not cache file's presence information if it expires in less than a day.

maruel:
I'd recommend to fix 2) with changing last_access to be a DateTimeProperty 
instead of a DateProperty, let it live on all servers for a few days so all 
ContentEntry entities are tagged with the new member, then move on to the new 
response and client side code.

Original issue reported on code.google.com by maruel@chromium.org on 7 Jan 2014 at 8:38

GoogleCodeExporter commented 9 years ago
It will have multiplicative effect with issue 125.

1) Jitter is now added to ContentEntry.expiration_ts. The returning an 
timestamp idea is still a good idea.
2) is not true anymore, ContentEntry.expiration_ts precision was fixed to be a 
DateTimeProperty.

There's a 3rd solution:

When a bot knows the server has a file; it gathers all the list of files that 
are knows to be valid. Instead of doing /pre-upload requests with 100 items at 
a time, it streams the whole thing in one shot at the end into a new API; so 
that the stamping is done asynchronously. The idea is to reduce the number of 
HTTP requests, while still stamping items as needed. I'm not even sure it's 
necessary in practice.

Original comment by maruel@chromium.org on 28 Aug 2014 at 1:53

GoogleCodeExporter commented 9 years ago

Original comment by maruel@chromium.org on 28 Aug 2014 at 1:56

GoogleCodeExporter commented 9 years ago

Original comment by maruel@chromium.org on 28 Aug 2014 at 1:57

GoogleCodeExporter commented 9 years ago
It's about time to fix it now.

One big saving would be to tag a ".isolated" file, so that the server could 
start a task queue to add each of the inner files when the .isolated was 
already cached. It's useful when using split .isolated files, one for binaries, 
one for the test data.

Original comment by maruel@chromium.org on 30 Sep 2014 at 5:26

GoogleCodeExporter commented 9 years ago

Original comment by maruel@chromium.org on 30 Sep 2014 at 5:35

GoogleCodeExporter commented 9 years ago
Not worth implementing in the current implementation. Blocking on client 
rewrite.

Original comment by maruel@chromium.org on 8 Jan 2015 at 4:05