[DOCS] Improve worker_threads documentation

qcpeter commented 2 years ago

Description I'm struggling to get a feel for how to tune the worker_threads value for a multi-syndic/multi-master setup. I'm seeing issues with the syndic processes being able to call fire_master and also not being able to publish results to the master, however our monitoring does not indicate any excess load or CPU utilization on any of the masters. Currently this is set to 4 times the number of cores of the master host. Trying to figure this out be trial and error is quite difficult on a production cluster.

Suggested Fix Some indication of how this parameter should scale with the number of cores and how to determine whether increasing the number of worker threads is appropriate would be really helpful, the worker_threads parameter isn't even mentioned on your Salt at Scale page.

Type of documentation Salt documentation

Location or format of documentation https://docs.saltproject.io/en/latest/ref/configuration/master.html#worker-threads

waynew commented 2 years ago

Hey @qcpeter thanks for the suggestion!

FWIW, we've discussed deprecating syndics, though I'm not sure if we have a specific replacement in place. In any case, it does seem like a good thing to add (in particular the worker_threads bit) - if you have any of your own suggestions, PRs are always welcome and encouraged!

qcpeter commented 2 years ago

Hi @waynew,

Thanks for the reply. That's very intriguing why would you plan to do that? It seems that syndics or something very similar are the only way to share the responsibilities for pillar rendering between hosts whilst also allowing commands to be issued to a large number of hosts. Would the suggesting be to scale vertically rather than horizontally?

If there's no current guidance on the number of worker threads then I'll see what my experimentation brings up and report back.

OrangeDog commented 2 years ago

Can someone update the title to not be blank?

whytewolf commented 2 years ago

Hi @waynew,

Thanks for the reply. That's very intriguing why would you plan to do that? It seems that syndics or something very similar are the only way to share the responsibilities for pillar rendering between hosts whilst also allowing commands to be issued to a large number of hosts. Would the suggesting be to scale vertically rather than horizontally?

The problem with syndic is it was a band-aid solution to the problem. It doesn't use most of the modern concepts in salt. and can be rather buggy. and a pain to work on the bugs that do crop up. there is a better solution in Saltstack Config. However a better open source solution does not currently exist. The only reason we have not actually depreciating syndic yet is that it doesn't have a replacement for scaling in open source yet.

If there's no current guidance on the number of worker threads then I'll see what my experimentation brings up and report back.

so, worker_threads in general should not be more then 1.5 times the number of cpus. more issues can arise from putting more as having more threads causes cpu multiplexing issues as these threads want to be high available. but with more threads then cpu you end up context switching timeouts between worker threads. this can cause a drop in queue or just no response to the queue.

saltstack / salt

[DOCS] Improve worker_threads documentation #60965