madecoste / swarming

Automatically exported from code.google.com/p/swarming
Apache License 2.0
0 stars 1 forks source link

Move data to nearline storage as phase 1, delete as phase 2 #217

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Leverage https://cloud.google.com/storage/docs/nearline-storage

At the moment, there's a single expiration value (generally 30 days) where the 
data is expunged from the cache.

Proposal:
- Define 2 phases:
  - Phase 1 moves the data from online storage to nearline storage. Proposed default: 21 days, instance configurable.
  - Phase 2 deletes the data. Proposed default: 368 days, instance configurable.

- Add GlobalConfig.migrate_to_nearline_secs and GlobalConfig.delete_after_secs.
- Remove GlobalConfig.default_expiration.
- Add GlobalConfig.gs_nearline_bucket, .gs_nearline_client_id_email, 
.gs_nearline_private_key.
- Add UI to modify these.

- Add in_nearline = ndb.BooleanProperty().
- Change logic when setting ContentEntry.expiration_ts to account for the 
location. When .in_nearline, it's value is relative to .delete_after_secs. 
Otherwise it's relative to .migrate_to_nearline_secs.

- InternalCleanupOldEntriesWorkerHandler in handlers_backend.py; Migrate from 
.gs_bucket to .gs_nearline_bucket instead of incremental_delete(). Update 
ContentEntry.expiration_ts accordingly.
- Create a second cleanup cronjob handler to do the actual deletion.

- Add logic to handle files frequently access but not uploaded anymore; e.g. 
always fetching cold storage with no automatic migration. This could be handled 
via a ContentEntry.last_accessed_ts that would be updated only once every 24 
hours or so. This increases the number of writes in the DB so this must be done 
carefully and was left out on purpose.

Original issue reported on code.google.com by maruel@chromium.org on 11 Mar 2015 at 6:06