vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.66k stars 590 forks source link

Vespa content nodes are throwing exceptions while bringing up a new cluster #22085

Closed 107dipan closed 2 years ago

107dipan commented 2 years ago

Describe the bug Vespa content nodes are throwing exceptions while bringing up a new cluster. WE are bringing up a new cluster but we are getting the following exceptions at the content node level.

bash-4.2$ /opt/vespa/bin/vespa-logfmt | grep stderr [2022-04-11 12:33:53.597] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/thread.cpp:42: vespalib::Thread::Thread(vespalib::Runnable&, vespalib::Thread::init_fun_t): Assertion thread != nullptr' failed. [2022-04-11 12:34:06.608] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/thread.cpp:42: vespalib::Thread::Thread(vespalib::Runnable&, vespalib::Thread::init_fun_t): Assertionthread != nullptr' failed. [2022-04-11 12:34:36.719] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/threadstackexecutorbase.cpp:183: void vespalib::ThreadStackExecutorBase::start(uint32_t): Assertion thread != nullptr' failed. [2022-04-11 12:36:43.115] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/threadstackexecutorbase.cpp:183: void vespalib::ThreadStackExecutorBase::start(uint32_t): Assertionthread != nullptr' failed.

Environment (please complete the following information):

Can someone please help us understand what this issue is

107dipan commented 2 years ago

If this issue is related to not able to create threads can u tell us how we can configure this

bratseth commented 2 years ago

That is something you need to do on the K8S side.

I think you configure a very large number of threads in services.xml? As a workaround you could configure much less, that would also give you higher query throughput.

jobergum commented 2 years ago

Yes, this is running into a ulimit or max threads per process setting.What are your settings and what does ulimit -a and ulimit -Ha show inside the container/pod?

107dipan commented 2 years ago

At the content level we have only configured these values. IsS there any other threads we can configure at the content level from services.xml? We have configured search, persearch and summary threads as 128, 4, 64 respectively. We have defined around 30 schemas in this content cluster

107dipan commented 2 years ago

We ran the command ulimit -a and this was the output "max user processes (-u) 4194304" We are running these on OpenShift Bare metal clusters with good capacity. Could this error be because it is not not honoring the cgroup limits that are set?

jobergum commented 2 years ago

Vespa content node process (proton-bin) uses nproc to determine number of logical processors which should be cgroup limits aware.

baldersheim commented 2 years ago

How much memory does the nodes have ? How many cores ? There are other limits that "max user processes" matters. Could you give the whole output ?

107dipan commented 2 years ago

bash-4.2$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2061952 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4194304 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited bash-4.2$ ulimit -Ha core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2061952 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 4194304 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

107dipan commented 2 years ago

content: requests: memory: 32G cpu: 4 limits: memory: 64G cpu: 10

baldersheim commented 2 years ago

Hmm, none of those should kick in here. Then there might be some cgroups stuff. What does 'cat /sys/fs/cgroup/pids/pids.max' produce ?

107dipan commented 2 years ago

cat /sys/fs/cgroup/pids/pids.max 1024

baldersheim commented 2 years ago

That is most likely your limitation. Default is that there are threads allocated per schema. This will change soon, but should increase the pids.max limitation to a higher number, fx 4096. Also what does 'nproc' say ?

If you only have 4 cores I would also reduce the configured threads search, persearch and summary. I normally advise search to be 2x number of cores, persearch should never be more than number of cores, and summary should also be equal to number of cores. However persearch I always adise to start with the default value of 1 and only increase if you need lower latency as it will reduce throughput and efficiency.

107dipan commented 2 years ago

Do u mean the total threads created in the content nodes will be (number of schema configured earch thread)? If we have 33 schemas defined for our content cluster do we need to set the pid max value to 33 128(search threads set in the services.xml)

107dipan commented 2 years ago

Can someone help us understand the above query? Thanks!

kkraune commented 2 years ago

Hi @107dipan . I am working on improving documentation at https://docs.vespa.ai/en/performance/sizing-search.html and https://docs.vespa.ai/en/reference/services-content.html#requestthreads. In particular, @baldersheim details the values for search, persearch and summary exactly (I'll add this to docs) - in your case, given 4 cpu, it should be 8, not 128. @bratseth has also indicated above that is is too high, given your content node configuration.

So, please follow @baldersheim 's suggestion for the three parameters to get the content node started. Then your next step will be to evaluate the query performance, and fine-tune, possibly changing node configuration - the links above are good resources in that process

baldersheim commented 2 years ago

Prior to 7.574 it was by default num_schema * X where X depends on number of cores on the node and some other settings. The summary threads and search threads are global and does not depend on number of schemas. After 5.574 the number of schemas does not effect the amount of threads needed.

But independent of that 1024 is a too low number.

107dipan commented 2 years ago

I followed @baldersheim suggestion and tuned the content node params such that the search thread is twice the cores and per search thread and summary is equal to the number of cores. Content nodes did not come up with 38 schemas but came up with 5 schemas. We are currently using 7.559.12 vespa version. Just wanted to confirm.. num_schema * X is the search thread count, right?

jobergum commented 2 years ago

But independent of that 1024 is a too low number.

This is the main point. ^

X refers to threads related to background jobs, indexing, and so on.

The summary threads and search threads are global and does not depend on number of schemas.