sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Logs make a forever growing file system usage #128

Closed johnml1135 closed 1 year ago

johnml1135 commented 1 year ago

We did load testing and can see big jumps in log data - and also can see the ever growing Mongo log. These logs are stored in Loki/graphana/Prometheus and shouldn't be stored on the container fs. We should both:

Monitor disk usage: min(container_fs_usage_bytes{namespace=~"nlp|serval"}) by (container, namespace) / on (container, namespace) min(container_fs_limit_bytes{namespace=~"nlp|serval"}) by (container, namespace) image

johnml1135 commented 1 year ago

This one is for making the logging no longer write to the hard drive. Monitoring the hard drive is here: #129

Enkidu93 commented 1 year ago

@johnml1135 I'm having issues with rancher as usual (intermittently unavailable), so I can't verify this, but I'm wondering if the growth you're seeing isn't log-related. For example, MongoDB might be taking up more room on disk just from the the new documents (which does include the oplog, yes). After going to rancher and looking closer, it looks like Mongo is the only one consistently growing (?). Plus, it looks like k8s has a default limit on log-file size of 10MB per container (see here - heading Logging at the node level), so this shouldn't be an issue unless something has been altered which I don't see evidence of in the configs. As for the Mongo oplog, it's a capped collection, so it should also have a reasonable limit (which is configurable) associated with it. Am I missing something?

Enkidu93 commented 1 year ago

@johnml1135 Did you see the above? Thoughts?

johnml1135 commented 1 year ago

Hmmm - if you found that the limits are capped - and given that the % of the hard drive is around 0.5%, I am inclined to close this for now. We should have a lot of time if an issue does creep up - and we will also be alerting on 80% of the hard drive being full.