weaviate / contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-contextionary
BSD 3-Clause "New" or "Revised" License
14 stars 2 forks source link

Investigate potential memory issue #25

Closed etiennedi closed 4 years ago

etiennedi commented 4 years ago

The c11y container should not grow substantially larger than the size of the memory-mapped file. If it does so, it should start freeing up memory when it gets scarce.

However instead we seem to notice a pattern in the wild where instead of the mem usage being reduced, the c11y keeps on growing until it eventually gets OOM-killed (e.g. exit code 137).

Todos

jettro commented 4 years ago

Would it be possible to limit it, just like with Elasticsearch? Or is that not possible with GO?

etiennedi commented 4 years ago

There are no direct means to limit it, as there is no fixed heap size or a "container", such as the JVM that can limit parameters. However, Go applications respond very well to OS/Container level restrictions, such as memoryLimits in Kubernetes manifests.

There is also a docker flag - probably this is what Kubernetes uses internally - but I think it's a bit less common in the wild. At least I haven't seen it used. This might also due to my bias of preferring K8s for production cases where a fine-grained resources restriction is more common.

As of now I'm not sure if the options can also be set through docker-compose, but my initial impression is that this should be possible.

Nevertheless, I'm still investigating the root cause, as I currently have a suspicion that the contextionary actually grows larger than it should. (This is more of a gut feeling at the moment, but I'll hopefully have some hard data for that soon).

etiennedi commented 4 years ago

So far, I've been able to do some memory profiling for the regular code. There is definitely no memory leak in the classical sense. Even as the overall consumption grows to several GBs, the Go profiler meassures only about 2MB of heap.

So, my initial assumption that this has to do with the memory-mapped (which is explicitly not included in the heap profiling) seems to be confirmed. But as of now I don't know yet what - and if anything at all - is going wrong there. I'll keep on investigating.

etiennedi commented 4 years ago

Fixed in #26