Configuration cluster not healthy after upgrading vespa

jwachmann commented 5 years ago

Hello,

Ran into an issue when testing an upgrade from vespaengine/vespa:7.11.13 to vespaengine/vespa:7.113.28. We followed the upgrade procedure (https://docs.vespa.ai/documentation/operations/live-upgrade.html) and after step 2 our config cluster appears to be in an unhealthy state.

Running "vespa-get-cluster-state" gives the following error:

Failed to fetch cluster state of content cluster 'forum':
500 Can't connect to vespa-configserver-2.vespa-configserver.testenv.svc.cluster.local:19050 (Connection refused)
Content-Type: text/plain
Client-Date: Mon, 07 Oct 2019 18:35:11 GMT
Client-Warning: Internal response

Can't connect to vespa-configserver-2.vespa-configserver.testenv.svc.cluster.local:19050 (Connection refused)

LWP::Protocol::http::Socket: connect: Connection refused at
/usr/share/perl5/LWP/Protocol/http.pm line 51.

For context in our testing environment we're running vespa with a 3 node configuration cluster and a 3 node container cluster. There's nothing particularly exciting about our services.xml setup but I can supply it if needed.

Manually running /opt/vespa/bin/vespa-start-services on each node in the configuration cluster appears to have resolved the issue. That makes me suspect it might be related to this change to the dockerfile: https://github.com/vespa-engine/docker-image/pull/11/files.

For now we can patch our deployment to run the above after each node starts up but what I'm wondering is if it's an actual issue in the vespa dockerfile or if we're just doing something wrong.

hmusum commented 5 years ago

Configuration servers are meant to run on their own nodes, the same goes for content nodes (it looks like you are using 3 nodes that run both configuration server and services (content cluster)). It's beneficial both for operability and performance reasons to run configuration servers and content services on their own nodes.

The change in https://github.com/vespa-engine/docker-image/pull/11 was done to avoid starting services on configuration servers, as that should not be done in production setups.

Look at https://docs.vespa.ai/documentation/vespa-quick-start-multinode-aws.html#configure-and-deploy if you want an example of how to map the different cluster types to nodes (ignore the AWS-specific stuff there).

jwachmann commented 5 years ago

Our config cluster nodes are set up to run on different hosts than our container and content cluster nodes. The hosts themselves are kubernetes pods running in a kubernetes cluster so while they may run on the same physical machines they should be isolated as though they were each on their own box.

For reference here's our services.xml:

<services version="1.0">
  <admin version="2.0">
    <adminserver hostalias="config0"/>
    <configservers>
      <configserver hostalias="config0"/>
      <configserver hostalias="config1"/>
      <configserver hostalias="config2"/>
    </configservers>
    <cluster-controllers>
      <cluster-controller hostalias="config0"/>
      <cluster-controller hostalias="config1"/>
      <cluster-controller hostalias="config2"/>
    </cluster-controllers>
  </admin>

  <container id="container" version="1.0">
    <document-api />
    <search />
    <nodes>
      <node hostalias="content0" />
      <node hostalias="content1" />
      <node hostalias="content2" />
    </nodes>
  </container>

  <content id="forum" version="1.0">
    <redundancy>2</redundancy>
    <documents>
        <document type="post" mode="index" />
        <document type="thread" mode="index" />
    </documents>
    <nodes>
      <node hostalias="content0" distribution-key="0" />
      <node hostalias="content1" distribution-key="1" />
      <node hostalias="content2" distribution-key="2" />
    </nodes>
  </content>

</services>

And here's our hosts.xml:

<hosts>
  <host name="vespa-configserver-0.vespa-configserver.testenv.svc.cluster.local">
    <alias>config0</alias>
  </host>
  <host name="vespa-configserver-1.vespa-configserver.testenv.svc.cluster.local">
    <alias>config1</alias>
  </host>
  <host name="vespa-configserver-2.vespa-configserver.testenv.svc.cluster.local">
    <alias>config2</alias>
  </host>

  <host name="vespa-services-0.vespa-service.testenv.svc.cluster.local">
    <alias>content0</alias>
  </host>
  <host name="vespa-services-1.vespa-service.testenv.svc.cluster.local">
    <alias>content1</alias>
  </host>
  <host name="vespa-services-2.vespa-service.testenv.svc.cluster.local">
    <alias>content2</alias>
  </host>
</hosts>

jwachmann commented 5 years ago

One thing I noticed on the config cluster, after starting up on the new version (and before running vespa-start-services) it doesn't look like anything is listening on 19050.

[root@vespa-configserver-2 bin]# ss -lnp
Netid  State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
nl     UNCONN     0      0              0:1084                        *                   
nl     UNCONN     0      0              0:0                           *                   
nl     UNCONN     4352   0              4:1036                        *                   
nl     UNCONN     768    0              4:0                           *                   
nl     UNCONN     0      0              6:0                           *                   
nl     UNCONN     0      0              9:0                           *                   
nl     UNCONN     0      0             10:0                           *                   
nl     UNCONN     0      0             12:0                           *                   
nl     UNCONN     0      0             15:0                           *                   
nl     UNCONN     0      0             16:0                           *                   
tcp    LISTEN     0      500            *:19070                      *:*                   users:(("java",pid=270,fd=385))
tcp    LISTEN     0      50             *:19071                      *:*                   users:(("java",pid=270,fd=184))
tcp    LISTEN     0      50             *:2181                       *:*                   users:(("java",pid=270,fd=352))
tcp    LISTEN     0      50     10.48.8.8:2182                       *:*                   users:(("java",pid=270,fd=353))
tcp    LISTEN     0      50     10.48.8.8:2183                       *:*                   users:(("java",pid=270,fd=355))

After running vespa-start-services I can see something listening:

[root@vespa-configserver-2 bin]# ss -lnp
Netid State      Recv-Q Send-Q  Local Address:Port                 Peer Address:Port              
nl    UNCONN     0      0                   0:1084                             *                   
nl    UNCONN     0      0                   0:0                                *                   
nl    UNCONN     768    0                   4:0                                *                   
nl    UNCONN     4352   0                   4:2096                             *                   
nl    UNCONN     0      0                   6:0                                *                   
nl    UNCONN     0      0                   9:0                                *                   
nl    UNCONN     0      0                  10:0                                *                   
nl    UNCONN     0      0                  12:0                                *                   
nl    UNCONN     0      0                  15:0                                *                   
nl    UNCONN     0      0                  16:0                                *                   
tcp   LISTEN     0      500                 *:19090                           *:*                   users:(("java",pid=1297,fd=29))
tcp   LISTEN     0      50                  *:19092                           *:*                   users:(("java",pid=1582,fd=184))
tcp   LISTEN     0      500                 *:19094                           *:*                   users:(("java",pid=1582,fd=124))
tcp   LISTEN     0      500                 *:19095                           *:*                   users:(("java",pid=1582,fd=182))
tcp   LISTEN     0      500                 *:19097                           *:*                   users:(("vespa-config-se",pid=1567,fd=15))
tcp   LISTEN     0      500                 *:19098                           *:*                   users:(("vespa-config-se",pid=1567,fd=11))
tcp   LISTEN     0      500                 *:19101                           *:*                   users:(("java",pid=1581,fd=124))
tcp   LISTEN     0      500                 *:19070                           *:*                   users:(("java",pid=270,fd=385))
tcp   LISTEN     0      50                  *:19071                           *:*                   users:(("java",pid=270,fd=184))
tcp   LISTEN     0      50                  *:2181                            *:*                   users:(("java",pid=270,fd=352))
tcp   LISTEN     0      50          10.48.8.8:2182                            *:*                   users:(("java",pid=270,fd=353))
tcp   LISTEN     0      50          10.48.8.8:2183                            *:*                   users:(("java",pid=270,fd=355))
tcp   LISTEN     0      50                  *:19050                           *:*                   users:(("java",pid=1581,fd=212))

hmusum commented 5 years ago

Yes, the cluster-controllers are listening to port 19050, they are the controllers for the content cluster and will be started when /opt/vespa/bin/vespa-start-services is run. So your observation on these not running on config servers until /opt/vespa/bin/vespa-start-services is executed is correct and as expected. See documentation here for configuring these: https://docs.vespa.ai/documentation/reference/services-admin.html#cluster-controller. I would recommend running them on separate nodes or on content nodes with standalone-zookeeperset to true.

jwachmann commented 5 years ago

That makes sense, thanks for clarifying.

One last question - anything advice/guidelines/gotchas around moving our cluster controllers to separate nodes? Or is it as simple as updating hosts.xml & services.xml, then redeploying the model?

hmusum commented 5 years ago

You will lose whatever state (nodes being down, in maintenance etc.) the clustercontrollers have when you do the move and it might take a little time before the new clustercontrollers have done leader election and agreed on a stable system state after the change is done. I would recommend making sure everything is up and working as expected and stop feeding if possible before doing the switch (which is as simple and updating hosts.xml & services.xml and deploying, as you say). You want to do /opt/vespa/bin/vespa-stop-services on the config server nodes after doing the change to make suring nothing is left behind running there that shouldn't, the cluster controllers should be started automatically on the nodes you are moving them to (if /opt/vespa/bin/vespa-start-services has been run)

jwachmann commented 5 years ago

That makes sense, thanks.

It unfortunately doesn't look realistic to stop feeding during the switchover. Should we expect any side effects as a result (i.e. items failing to index during the swap)?

bratseth commented 5 years ago

Yes, you might have a period when writes fail, although not if the cluster is otherwise stale at the time. You could avoid this by first adding new controllers and then remove the old ones in a later deploy.

jwachmann commented 5 years ago

Got it. Thanks for your help!

vespa-engine / vespa

Configuration cluster not healthy after upgrading vespa #10928