Manage disk usage for elastic cloud (post - GA)

ElizabethStirling commented 3 years ago

Centralized logging takes up a lot of storage! We should establish retention policies, and configure elastic's hot-warm architecture. This will require re-creating the elastic cluster, but this shouldn't be too problematic this early on.

Additionally, if we re-create the deployment, we should consider creating it in GCP, rather than AWS. Not only will this keep our infrastructure on one cloud provider, but GCP high storage nodes are more performant for elastic hot-warm architecture than AWS nodes (x)

Estimating this to be expensive and time consuming since it'll require:

An RFC
Configuring a new Elastic deployment
Updating our logging infrastructure to send data to the new deployment
Moving data from our current cluster to our new deployment

Unless #16962 reveals we're burning disk at an absurd rate, we should do this post cloud GA, since we can avoid doing the work by throwing more money at elastic. As long as our log ingestion stays under 1TB/day, I'd say we can put this off

ElizabethStirling commented 3 years ago

WRT results of #16962, it looks like we can do this post-GA

ElizabethStirling commented 3 years ago

Currently re-considering whether this is more urgent, since we're ingesting significantly more logs per day now

chayim commented 3 years ago

As part of this work see https://github.com/sourcegraph/sourcegraph/issues/17242

sourcegraph / sourcegraph-public-snapshot

Manage disk usage for elastic cloud (post - GA) #16963