Open kawatea opened 6 months ago
After reproducing this locally I can confirm your observations that distributor-base-port
simply doesn't work at all. I believe this particular property is a remnant from an older version of the content cluster model, which probably shouldn't have been documented at all. It belongs in a museum! 🦖
You are also correct that the searchnode
service does not appear to get its ports assigned as expected relative to the specified base port. I'm not sure of the reason behind this, but a lot of the port assignment logic has been refactored over the years. Since we run our nodes in separate containers (without specifying base ports) instead of co-located, this means we probably haven't noticed any regressions popping up here.
I'd strongly suggest using containers to avoid this problem altogether. This is where our development focus is concentrated and deployment friction can be expected to be minimal. Unless there are technical reasons that preclude using containers for this use case?
Thank you for taking care of this. We basically followed the sample setting (e.g. https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml). What do you mean by containers and how can we set up?
What do you mean by containers and how can we set up?
In this context it would usually mean Docker/Podman containers or perhaps Kubernetes pods, depending on how you provision compute and storage resources.
You can observe that in the multinode HA example app distinct content nodes also have distinct node aliases (node8
, node9
in this specific case) and are running in distinct Docker containers. The example app runs these on the same host—a production deployment would generally not do this (for availability reasons), but for testing that's perfectly fine.
For your use case that would mean that instead of having both test1
and test2
clusters use node1
you could instead have test1
use node1
and test2
use node2
(as an example), where these two would be running in separate Docker/Podman containers. If node1
and node2
are running on the same physical host, this would avoid port conflicts as well as help enforce privilege and resource usage separation between the two logical nodes.
I see, we are already using Kubernetes pods. We have many content clusters and each of them uses the grouped distribution with multiple nodes. If we assign different nodes for them, we need to manage hundreds of nodes and it's not realistic. So we want to use the same nodes in different content clusters.
we need to manage hundreds of nodes
Really, the right solution here is to have Vespa Cloud manage it, which it can do even though they are in your account. I'm forgetting why you said this wasn't an option for you, but if it might help speaking to the right people on your side we can try to do that for you. You can mail me, bratseth at vespa.ai.
Another member in our team is already talking about Vespa Cloud. Ultimately we want to use it but we cannot do it soon. It's great if there is a solution to handle this issue through configurations.
We see that solution as inferior to using multiple containers, since this provides uniform management of nodes and resource isolation between clusters, so we are unlikely to spend any time on this.
Describe the bug When we set
baseport
for content cluster nodes, it works for storagenode but doesn't work well for distributor/searchnode.distributor-base-port
also doesn't work at all.To Reproduce Steps to reproduce the behavior:
baseport
Add another content cluster before the existing cluster
Expected behavior
baseport
setting should work properly for distributor/searchnode and we should be able to avoid restarting services.Environment (please complete the following information):
Vespa version 8.339.15
Additional context We want to avoid restarting unrelated distributor/storagenode/searchnode when adding/removing/updating content clusters because it causes outage for a while.
baseport
setting works for storagenode but doesn't work well for distributor (distributor-base-port
also doesn't work) and doesn't affect searchnode.