sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Prevent services from running out of ephemeral storage space and being evicted #9604

Closed beyang closed 3 years ago

beyang commented 4 years ago

Set ephemeral storage resource requests and limits in deploy-sourcegraph. A customer noticed their sourcegraph-frontend pods were getting evicted because the node ran out of disk space. sourcegraph-frontend currently requests 0, and was using ~50 MB. On k8s.sgdev.org it uses ~600 MB.

uwedeportivo commented 4 years ago

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.15 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly. When in doubt, reach out!

Thank you

uwedeportivo commented 4 years ago

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.16 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly. When in doubt, reach out!

Thank you

beyang commented 4 years ago

@ggilmore please make this a top priority in 3.17 as this just caused a sitewide outage at an important customer (https://sourcegraph.slack.com/archives/CTGBWCKM0/p1590044563076400)

ggilmore commented 4 years ago

It's difficult to have a one-size-fits-all solution for this solely ephemeral storage limits / requests. Ephemeral storage includes both things like logs and all scratch space used by the container. My educated guess is that https://github.com/sourcegraph/sourcegraph/issues/8308 is the main culprit here (although this affects all services that need local caches for code ex. searcher, symbols, lang-go etc.). It's highly likely that the amount of scratch space each service uses varies depending on the instance size and traffic patterns. If we pick limits / requests that are too small, we run this risk of service degradation due to constant pod evictions (this can happen if frontend needs to clone a large repository, say more than 1 gig). If we pick limits / requests that are too large, we are in-effect enforcing minimum disk space requirements on each nodes. This may not be something that we have any control over.

In addition, there doesn't seem to be any way to have a real-time reading for the current ephemeral storage usage for a given container.

I think that fixing the unbounded cache issue (tracked in https://github.com/sourcegraph/sourcegraph/issues/8308) for each of these services (perhaps through a shared package) is the correct long-term solution.

aisbaa commented 4 years ago

It's difficult to have a one-size-fits-all solution for this solely ephemeral storage limits / requests.

We migrated from nodes with 50GB-100GB disks to nodes with 1000GB-500GB in hopes that this issue would be alleviated. But we still see it happen. From billing report disk is not that expensive and all these gigabytes would be wasted if we don't use those. To sum up it is okay to have high limits for ephemeral storage.

uwedeportivo commented 4 years ago

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.17 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly. When in doubt, reach out!

Thank you

slimsag commented 4 years ago

https://github.com/sourcegraph/sourcegraph/issues/8308 for next steps here

uwedeportivo commented 4 years ago

got report from another customer with symbols pod getting evicted because of ephemeral-storage limits reached. they were set to 8Gi.

pecigonzalo commented 3 years ago

@davejrt I saw the linked PR, as far as I recall from the planning meeting, we were also unsure if this was actually required, as the reported issue talks about logs (I suspect logs to files) and as far as I understand STDOUT logs do not use ephemeral storage, so its some services that are storing cache or other content outside of the mounted volumes, which I believe is unexpected.

daxmc99 commented 3 years ago

@pecigonzalo I believe Local Ephermal Storage is also used for node-level container logs. Container engines log stdout & stderr to ephemeral storage according to https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level

pecigonzalo commented 3 years ago

@daxmc99 I believe that is separate from Pod ephemeral storage, its ephemeral in the sense that its not persisted if the node fails, but I dont think it counts towards the Pod specific ephemeral storage. Im trying to find some docs to validate/refuse it.

pecigonzalo commented 3 years ago

As an additional note, if the problem is logs, IMO we should not by default allocate ephemeral for and instead be managed by the native container log rotation (daemon config for Docker) or by the end-user, as we cant predict the log size given it depends on the user activity.

pecigonzalo commented 3 years ago

Another note :). As an example, we never ran into this problem in sourcegraph.com and we dont have ephemeral mounts. Maybe it is used for logs, but the container engine should in most cases rotate those logs which would keep them in check.

pecigonzalo commented 3 years ago

This https://access.redhat.com/solutions/4367311 will validate what you said that ephemeral is indeed used for the STDOUT/STDERR logs and my assumption that the container engine is expected to keep them in check.

daxmc99 commented 3 years ago

According to the gke docs this is also a new feature, not every cluster would respect Ephermal Storage limits if we set them. On cloud today, we allocate 500 GB boot disks with about 39 GB used today for everything under /var/ (where logs are stored). Since we roll pods fairly often on Cloud I'm not sure if that is a good proxy for a long-running customer environment.

pecigonzalo commented 3 years ago

@daxmc99 but shouldnt CRIO/Docker take care of rotating those and keeping them in check? (for logs specifically) I dont know, I cant see any other public manifest include ephemeral either.

As said, my understanding here is that is that if we need scratch space, we should allocate ephemeral but probably through emptyDir, but not for logs.

daxmc99 commented 3 years ago

Agreed. It should be the job of the container runtime to rotate logs and prevent them from growing too large but ultimately they are factored into eviction.

Do we know why the frontend needs this cache space?