vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.58k stars 586 forks source link

Memory summed up from metrics API doesn't add up to memory_rss #22442

Closed nehajatav closed 2 years ago

nehajatav commented 2 years ago

Describe the bug Summing up allocated_bytes for all schemas on a given content node is much lesser than memory_rss. What else occupies the same? How can we bring down memory consumption?

To Reproduce Steps to reproduce the behavior:

  1. Start feeding data and note the memory footprint
  2. Memory consumption at a content node level is very high as compared to allocated_bytes across index, attributes, document_store

Expected behavior allocated_bytes across index, attributes, document_store should add up to memory_rss

Screenshots top command from within pod:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    465 nobody    20   0   48.8g  39.8g  76780 S  26.6  7.9 251:08.13 vespa-proton-bi

metrics/v2/values

hostname: "vespa-content-0.vespa-internal.mynamespace.svc.cluster.local",
name: "vespa.searchnode",
memory_virt: 52383350784,
memory_rss: 42819596288,

state/v1/metrics summing up all the memory.allocated_bytes (converted to GBs across schemas), I don't see more than 11GB consumed on a content node:

attribute   memory_usage    allocated_bytes     6.259003234
index   memory_usage    allocated_bytes     3.187027048
notready    document_store  memory_usage    allocated_bytes 0.079706368
ready   document_store  memory_usage    allocated_bytes 2.00707792
removed document_store  memory_usage    allocated_bytes 0.080017664
memory_usage    allocated_bytes         11.61285981

Environment (please complete the following information): OS: Docker image vespa:7.559.12 Infrastructure: Kubernetes Versions Major:"1", Minor:"21", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"

Content pod:
Limits:
cpu: 8
memory: 64G
Requests:
cpu: 4
memory: 32G
Container pod:
Limits:
cpu: 2
memory: 8G
Requests:
cpu: 2
memory: 8G
Config pod:
Limits:
cpu: 3
memory: 16G
Requests:
cpu: 3
memory: 16G

Host and services.xml: same as in this comment: https://github.com/vespa-engine/vespa/issues/22315#issuecomment-1113074776

Vespa version 7.559.12 compiled with go1.16.13 on linux/amd64

Additional context The memory footprint used to be much lesser. The only schema changes made after that are (i) adding fieldset (ii) adding fast-search to fields that are not being hydrated as part of feeding client with no allocated_bytes for this field as per state/v1/metrics (iii) converting one field from string to array with overall memory footprint of 130Mb at field level as per state/v1/metrics

kkraune commented 2 years ago

Please refer to https://docs.vespa.ai/en/performance/ for an understanding of memory use. This is a large, complex problem, and it is generally hard to sum up such metrics for the rss total.

Commercial offerings like https://cloud.vespa.ai/en/memory-visualizer lets you more easily analyse memory usage per schema / data - alternatively, use the metrics, see https://docs.vespa.ai/en/reference/metrics.html

nehajatav commented 2 years ago

Undoing fieldset changes and removing fast-search on unused field (no documents has this field populated) brought down the memory consumption. Any idea if one of these two are memory heavy?

kkraune commented 2 years ago

@geirst correct me if I am wrong

Adding/removing a fieldset should not have impact on neither disk nor memory footprint.

fast-search builds a an index structure to improve query performance. This index structure will point to posting lists, that will be empty for unused fields. I hence think adding fast-search increases memory usage, even for empty fields. https://docs.vespa.ai/en/attributes.html#index-structures

Changing attribute settings normally requires restarts, see https://docs.vespa.ai/en/reference/schema-reference.html#modifying-schemas - so changes is memory footprint can also be a result of this (structures are flushed, transaction log etc).

geirst commented 2 years ago

Yes, @kkraune you are right. fast-search leads to a slight memory increase even for empty fields due to the extra index structures created. E.g. single value fields will have the "undefined value" when empty, and there is a posting list for this value.

nehajatav commented 2 years ago

single value fields will have the "undefined value" when empty, and there is a posting list for this value

That explains why in another attempt, we merged the schema for 28 document types into a single document type and memory footprint grew in 1/4th the number of documents would in 28 schema scenario.