nginxinc / nginx-gateway-fabric

NGINX Gateway Fabric provides an implementation for the Gateway API using NGINX as the data plane.
Apache License 2.0
477 stars 93 forks source link

Error updating status of gateway resource in NFR Scale test HTTPSListeners #2562

Closed bjee19 closed 3 days ago

bjee19 commented 4 days ago

In this pipeline run of the NFR test results on edge: https://github.com/nginxinc/nginx-gateway-fabric/actions/runs/10872403318, the scale test for HTTPSListeners encountered an error when ran on OSS.

Shown Error:

{
  "level": "debug",
  "ts": "2024-09-15T17:08:33Z",
  "logger": "statusUpdater",
  "msg": "Encountered error updating status",
  "error": "Operation cannot be fulfilled on gateways.gateway.networking.k8s.io \"gateway\": the object has been modified; please apply your changes to the latest version and try again",
  "namespace": "scale",
  "name": "gateway",
  "kind": "Gateway"
}

The full NFR test results can be found in this PR: https://github.com/nginxinc/nginx-gateway-fabric/pull/2554

kate-osborn commented 3 days ago

After some investigation, this does not appear to be a product bug. We retry on status update failures because the object may be modified between fetching the object and writing its status. On each retry, we fetch the latest version of the object. If multiple retries fail, we will log an error message saying, "Failed to update status." Since the logs do not contain that error message, we can assume that the status update eventually succeeded.

If this error persists in our NFR tests, we should look into the test code and see if we can prevent this situation from occurring. Perhaps we can wait for an object's status to update before modifying the object's spec.