sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

Manage disk usage for elastic cloud (post - GA) #16963

Open ElizabethStirling opened 3 years ago

ElizabethStirling commented 3 years ago

Centralized logging takes up a lot of storage! We should establish retention policies, and configure elastic's hot-warm architecture. This will require re-creating the elastic cluster, but this shouldn't be too problematic this early on.

Additionally, if we re-create the deployment, we should consider creating it in GCP, rather than AWS. Not only will this keep our infrastructure on one cloud provider, but GCP high storage nodes are more performant for elastic hot-warm architecture than AWS nodes (x)

Estimating this to be expensive and time consuming since it'll require:

  1. An RFC
  2. Configuring a new Elastic deployment
  3. Updating our logging infrastructure to send data to the new deployment
  4. Moving data from our current cluster to our new deployment

Unless #16962 reveals we're burning disk at an absurd rate, we should do this post cloud GA, since we can avoid doing the work by throwing more money at elastic. As long as our log ingestion stays under 1TB/day, I'd say we can put this off

ElizabethStirling commented 3 years ago

WRT results of #16962, it looks like we can do this post-GA

ElizabethStirling commented 3 years ago

Currently re-considering whether this is more urgent, since we're ingesting significantly more logs per day now

chayim commented 3 years ago

As part of this work see https://github.com/sourcegraph/sourcegraph/issues/17242