Closed sunil924 closed 2 years ago
Application re-deployments (changing the application package) are not atomic across all services, in your case, it could be that your feed containers are no longer on in contact with the configuration server(s), hence running with the old application generation. The vespa-config-status tool can be used to check application generation across the services in the cluster.
This sample app can be used to study how to deploy a high availability deployment of Vespa https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA, see also https://docs.vespa.ai/en/operations/configuration-server.html#troubleshooting
Thanks for your response Jo. Couple of questions:
1) Not necessarily but if this is a transient network split then yes 2) Applying the changes is asynchronous and not atomic.
Some more details: The deploy call succeeds when the new configuration is loaded and validated on the config server. Convergence to the new config on each node happens asynchronously and may take a long time (minutes). If nodes are down or unable to reach the config server it may of course take even longer, so having the deploy command wait for this is not feasible. The nodes will keep trying to converge to the new config forever, so when missing connectivity is restored the system will self-heal. It is possible to discover the current config generation active on each node in the states/v1 API, and by reading the metric "generation".
I'm resolving this, see the explanation above from @bratseth.
Schema fields are missing on some of the nodes We have existing vespa cluster running. For a new requirement, we added new fields of type bool and Array in the existing schema and redeployed the application. Application deployment was successful. No issue/error seen on console logs.
While trying to ingest documents in vespa using Jmeter, some of the requests( 2 request out of 10) got failed with following error: {"pathId":"/document/v1/myNamespace/mySchema/docid/67b2c355-04a1-4732-8d06-1bcf6218d54d","message":"No field 'KEY1_Array_B' in the structure of type 'mySchema', which has the fields: ........}
Same payload wassuccessfully persisted in subsequent calls. With the current state of cluster this issue is appearing frequently.
All the requests had same fields in payload.
Issue persisted even after redeployment of application.
Environment (please complete the following information):
Infrastructure: RedShift K8s cluster Versions :7.559.12
Sequence of events:
Can someone please help in understanding what is the issue and how to fix it.