Logs make a forever growing file system usage

johnml1135 commented 1 year ago

We did load testing and can see big jumps in log data - and also can see the ever growing Mongo log. These logs are stored in Loki/graphana/Prometheus and shouldn't be stored on the container fs. We should both:

Log the containers to see if they take up too much memory
Change the logging of the containers to merely print the output to the stdout, not a log file

Monitor disk usage: min(container_fs_usage_bytes{namespace=~"nlp|serval"}) by (container, namespace) / on (container, namespace) min(container_fs_limit_bytes{namespace=~"nlp|serval"}) by (container, namespace)

johnml1135 commented 1 year ago

This one is for making the logging no longer write to the hard drive. Monitoring the hard drive is here: #129

Enkidu93 commented 1 year ago

@johnml1135 I'm having issues with rancher as usual (intermittently unavailable), so I can't verify this, but I'm wondering if the growth you're seeing isn't log-related. For example, MongoDB might be taking up more room on disk just from the the new documents (which does include the oplog, yes). After going to rancher and looking closer, it looks like Mongo is the only one consistently growing (?). Plus, it looks like k8s has a default limit on log-file size of 10MB per container (see here - heading Logging at the node level), so this shouldn't be an issue unless something has been altered which I don't see evidence of in the configs. As for the Mongo oplog, it's a capped collection, so it should also have a reasonable limit (which is configurable) associated with it. Am I missing something?

Enkidu93 commented 1 year ago

@johnml1135 Did you see the above? Thoughts?

johnml1135 commented 1 year ago

Hmmm - if you found that the limits are capped - and given that the % of the hard drive is around 0.5%, I am inclined to close this for now. We should have a lot of time if an issue does creep up - and we will also be alerting on 80% of the hard drive being full.

sillsdev / serval

Logs make a forever growing file system usage #128