Closed 107dipan closed 2 years ago
If this issue is related to not able to create threads can u tell us how we can configure this
That is something you need to do on the K8S side.
I think you configure a very large number of threads in services.xml? As a workaround you could configure much less, that would also give you higher query throughput.
Yes, this is running into a ulimit or max threads per process setting.What are your settings and what does ulimit -a
and ulimit -Ha
show inside the container/pod?
At the content level we have only configured these values. IsS there any other threads we can configure at the content level from services.xml? We have configured search, persearch and summary threads as 128, 4, 64 respectively. We have defined around 30 schemas in this content cluster
We ran the command ulimit -a and this was the output "max user processes (-u) 4194304" We are running these on OpenShift Bare metal clusters with good capacity. Could this error be because it is not not honoring the cgroup limits that are set?
Vespa content node process (proton-bin) uses nproc to determine number of logical processors which should be cgroup limits aware.
How much memory does the nodes have ? How many cores ? There are other limits that "max user processes" matters. Could you give the whole output ?
bash-4.2$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2061952 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4194304 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited bash-4.2$ ulimit -Ha core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2061952 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 4194304 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
content: requests: memory: 32G cpu: 4 limits: memory: 64G cpu: 10
Hmm, none of those should kick in here. Then there might be some cgroups stuff. What does 'cat /sys/fs/cgroup/pids/pids.max' produce ?
cat /sys/fs/cgroup/pids/pids.max 1024
That is most likely your limitation. Default is that there are threads allocated per schema. This will change soon, but should increase the pids.max limitation to a higher number, fx 4096. Also what does 'nproc' say ?
If you only have 4 cores I would also reduce the configured threads search, persearch and summary. I normally advise search to be 2x number of cores, persearch should never be more than number of cores, and summary should also be equal to number of cores. However persearch I always adise to start with the default value of 1 and only increase if you need lower latency as it will reduce throughput and efficiency.
Do u mean the total threads created in the content nodes will be (number of schema configured earch thread)? If we have 33 schemas defined for our content cluster do we need to set the pid max value to 33 128(search threads set in the services.xml)
Can someone help us understand the above query? Thanks!
Hi @107dipan . I am working on improving documentation at https://docs.vespa.ai/en/performance/sizing-search.html and https://docs.vespa.ai/en/reference/services-content.html#requestthreads. In particular, @baldersheim details the values for search, persearch and summary exactly (I'll add this to docs) - in your case, given 4 cpu, it should be 8, not 128. @bratseth has also indicated above that is is too high, given your content node configuration.
So, please follow @baldersheim 's suggestion for the three parameters to get the content node started. Then your next step will be to evaluate the query performance, and fine-tune, possibly changing node configuration - the links above are good resources in that process
Prior to 7.574 it was by default num_schema * X where X depends on number of cores on the node and some other settings. The summary threads and search threads are global and does not depend on number of schemas. After 5.574 the number of schemas does not effect the amount of threads needed.
But independent of that 1024 is a too low number.
I followed @baldersheim suggestion and tuned the content node params such that the search thread is twice the cores and per search thread and summary is equal to the number of cores. Content nodes did not come up with 38 schemas but came up with 5 schemas. We are currently using 7.559.12 vespa version. Just wanted to confirm.. num_schema * X is the search thread count, right?
But independent of that 1024 is a too low number.
This is the main point. ^
X refers to threads related to background jobs, indexing, and so on.
The summary threads and search threads are global and does not depend on number of schemas.
Describe the bug Vespa content nodes are throwing exceptions while bringing up a new cluster. WE are bringing up a new cluster but we are getting the following exceptions at the content node level.
bash-4.2$ /opt/vespa/bin/vespa-logfmt | grep stderr [2022-04-11 12:33:53.597] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/thread.cpp:42: vespalib::Thread::Thread(vespalib::Runnable&, vespalib::Thread::init_fun_t): Assertion
thread != nullptr' failed. [2022-04-11 12:34:06.608] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/thread.cpp:42: vespalib::Thread::Thread(vespalib::Runnable&, vespalib::Thread::init_fun_t): Assertion
thread != nullptr' failed. [2022-04-11 12:34:36.719] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/threadstackexecutorbase.cpp:183: void vespalib::ThreadStackExecutorBase::start(uint32_t): Assertionthread != nullptr' failed. [2022-04-11 12:36:43.115] WARNING : searchnode stderr vespa-proton-bin: /builddir/build/BUILD/vespa-7.559.12/vespalib/src/vespa/vespalib/util/threadstackexecutorbase.cpp:183: void vespalib::ThreadStackExecutorBase::start(uint32_t): Assertion
thread != nullptr' failed.Environment (please complete the following information):
Can someone please help us understand what this issue is