nginxinc / docker-nginx-controller

Docker support for NGINX Controller Agent in Containers
Apache License 2.0
29 stars 26 forks source link

NGINX Controller 3.18 + STORE_UUID persistence issue #58

Closed fabriziofiorucci closed 3 years ago

fabriziofiorucci commented 3 years ago

Hi,

I'm playing with NGINX Controller 3.18 and one NGINX Plus r24-p1 instance built with the controller agent based on:

https://github.com/nginxinc/docker-nginx-controller#44-overriding-agent-nginx-controller-configuration

I built the image using:

cd docker-nginx-controller/ubuntu/no-nap/
sudo docker build --build-arg STORE_UUID=True --build-arg CONTROLLER_URL=https://nginx-controller.ff.lan/install/controller-agent --build-arg API_KEY='xxxyyyzzz' -t registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2 .
docker push registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2

I have a k8s cluster running v1.21.2 on Ubuntu 20.10

My deployment is:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: nginx
 namespace: test-ns
 labels:
   app: nginx
spec:
 selector:
   matchLabels:
     app: nginx
 replicas: 1
 template:
   metadata:
     labels:
       app: nginx
   spec:
     containers:
     - name: nginx-apigw
       image: registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2
       ports:
       - containerPort: 80
       - containerPort: 8080
       env:
         - name: ENV_CONTROLLER_API_URL
           value: "https://nginx-controller.ff.lan:8443/1.4/"
         - name: ENV_CONTROLLER_API_KEY
           value: "xxxyyyzzz"
         - name: ENV_CONTROLLER_INSTANCE_NAME
           value: "nginx-plus-on-k8s"

After applying this for the first time the instance correctly appears in the NGINX Controller, and the NGINX pod logs:

starting nginx ... waiting for nginx workers ... 2021/07/21 08:24:23 [notice] 6#6: using the "epoll" event method 2021/07/21 08:24:23 [notice] 6#6: nginx/1.19.10 (nginx-plus-r24-p1) 2021/07/21 08:24:23 [notice] 6#6: built by gcc 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 2021/07/21 08:24:23 [notice] 6#6: OS: Linux 5.8.0-59-generic 2021/07/21 08:24:23 [notice] 6#6: getrlimit(RLIMIT_NOFILE): 1048576:1048576 2021/07/21 08:24:23 [notice] 6#6: start worker processes 2021/07/21 08:24:23 [notice] 6#6: start worker process 9 updating /etc/controller-agent/agent.conf ... ---> using api_key = xxxyyyzzz ---> using controller api url = https://nginx-controller.ff.lan:8443/1.4/ ---> using instance_name = nginx-plus-on-k8s starting controller-agent ... time="Jul 21 2021 08:24:25.335" level="info" msg="Starting Nginx Controller (Go) Agent. Version: 3.18.1-316464192.release-3-18..." feature="main" time="Jul 21 2021 08:24:25.338" level="info" msg="Discovered nginxs" count="1" feature="main" time="Jul 21 2021 08:24:25.368" level="info" msg="Running 10 agent feature(s)" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading metrics" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading avrdmgmt" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading certsmgmt" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading http-bridge" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading cloudcfgmgmt" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading security-events-manager" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading eventsmgr" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading configurator" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading meta" time="Jul 21 2021 08:24:25.368" level="info" msg="Loading nginxmgmt" time="Jul 21 2021 08:24:25.369" level="info" msg="Starting AVRD" feature="avrdmgmt" time="Jul 21 2021 08:24:25.370" level="info" msg="AVRD started (PID: 34)" feature="avrdmgmt" time="Jul 21 2021 08:24:25.392" level="info" msg="Started all NGINX collectors" feature="metrics" time="Jul 21 2021 08:24:35.414" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"

If I try to delete the deployment and re-apply it, even though I built the image using STORE_UUID=True, I get:

starting nginx ... waiting for nginx workers ... 2021/07/21 08:24:51 [notice] 6#6: using the "epoll" event method 2021/07/21 08:24:51 [notice] 6#6: nginx/1.19.10 (nginx-plus-r24-p1) 2021/07/21 08:24:51 [notice] 6#6: built by gcc 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 2021/07/21 08:24:51 [notice] 6#6: OS: Linux 5.8.0-59-generic 2021/07/21 08:24:51 [notice] 6#6: getrlimit(RLIMIT_NOFILE): 1048576:1048576 2021/07/21 08:24:51 [notice] 6#6: start worker processes 2021/07/21 08:24:51 [notice] 6#6: start worker process 9 updating /etc/controller-agent/agent.conf ... ---> using api_key = xxxyyyzzz ---> using controller api url = https://nginx-controller.ff.lan:8443/1.4/ ---> using instance_name = nginx-plus-on-k8s starting controller-agent ... time="Jul 21 2021 08:24:53.932" level="info" msg="Starting Nginx Controller (Go) Agent. Version: 3.18.1-316464192.release-3-18..." feature="main" time="Jul 21 2021 08:24:53.936" level="info" msg="Discovered nginxs" count="1" feature="main" time="Jul 21 2021 08:24:53.963" level="info" msg="Running 10 agent feature(s)" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading http-bridge" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading meta" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading metrics" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading certsmgmt" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading security-events-manager" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading avrdmgmt" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading eventsmgr" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading cloudcfgmgmt" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading configurator" time="Jul 21 2021 08:24:53.963" level="info" msg="Loading nginxmgmt" time="Jul 21 2021 08:24:53.967" level="info" msg="Starting AVRD" feature="avrdmgmt" time="Jul 21 2021 08:24:53.967" level="info" msg="AVRD started (PID: 35)" feature="avrdmgmt" time="Jul 21 2021 08:24:53.986" level="info" msg="Started all NGINX collectors" feature="metrics" time="Jul 21 2021 08:25:04.009" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator" time="Jul 21 2021 08:25:04.164" level="info" msg="Streaming instance 1 ended" feature="configurator" time="Jul 21 2021 08:25:04.164" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"

Apparently the only way to fix this is to delete the instance from the NGINX Controller, and deleting/applying the deployment manifest again.

Am I missing something here?

fabriziofiorucci commented 3 years ago

Additionally, using:

    env:
      - name: ENV_CONTROLLER_API_URL
        value: "https://nginx-controller.ff.lan:8443/1.4/"
      - name: ENV_CONTROLLER_API_KEY
        value: "xxxyyyzzz"
      - name: ENV_CONTROLLER_INSTANCE_NAME
        value: "nginx-plus-on-k8s"
      - name: STORE_UUID
        value: "True"
      - name: ENV_CONTROLLER_INSTANCE_GROUP
        value: "nginx-k8s"

that is: adding the ENV_CONTROLLER_INSTANCE_GROUP makes it work after restarting the pod, but only if I manually delete the instance from the Controller. Apparently there is no way to make it work seamlessly after a pod restart. Any hint?

Thank you

fabriziofiorucci commented 3 years ago

Additionally, is there any support for Replicas > 1?

brianehlert commented 3 years ago

Using an instance_group should be the solution. To decouple the instances from the gateway. The instance_group being the new glue. There will still be orphaned instances - it is up the the internal of Controller to handle the orphaned instances properly.

When instnace_group is being used, store_uuid is no longer valuable - it is really intended for machines, not containers. Store_uuid loses value because each pod has a unique uuid and when K8s recycles pods, the new pods will have unique uuid values. And the old pods will become orphans - that Controller in turn needs to handle.

fabriziofiorucci commented 3 years ago

Thank you Brian,

I'm apparently missing something, it's still not working as expected here's what I did.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: test
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-apigw
        image: registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2
        ports:
        - containerPort: 80
        - containerPort: 8080
        env:
          - name: ENV_CONTROLLER_API_URL
            value: "https://nginx-controller.ff.lan:8443/1.4/"
          - name: ENV_CONTROLLER_API_KEY
            value: "xxyyzz"
          - name: ENV_CONTROLLER_INSTANCE_NAME
            value: "nginx-plus-on-k8s"
          - name: STORE_UUID
            value: "True"
          - name: ENV_CONTROLLER_INSTANCE_GROUP
            value: "nginx-k8s"

The NGINX Controller correctly displays the instance as up and running, and the controller logs:

starting nginx ...
waiting for nginx workers ...
2021/07/22 07:40:45 [notice] 6#6: using the "epoll" event method
2021/07/22 07:40:45 [notice] 6#6: nginx/1.19.10 (nginx-plus-r24-p1)
2021/07/22 07:40:45 [notice] 6#6: built by gcc 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 
2021/07/22 07:40:45 [notice] 6#6: OS: Linux 5.8.0-59-generic
2021/07/22 07:40:45 [notice] 6#6: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/07/22 07:40:45 [notice] 6#6: start worker processes
2021/07/22 07:40:45 [notice] 6#6: start worker process 9
updating /etc/controller-agent/agent.conf ...
 ---> using api_key = xxxyyyzzz
 ---> using controller api url = https://nginx-controller.ff.lan:8443/1.4/
 ---> using instance_name = nginx-plus-on-k8s
 ---> using instance group = nginx-k8s
starting controller-agent ...
time="Jul 22 2021 07:40:47.371" level="info" msg="Starting Nginx Controller (Go) Agent. Version: 3.18.1-316464192.release-3-18..." feature="main"
time="Jul 22 2021 07:40:47.376" level="info" msg="Discovered nginxs" count="1" feature="main"
time="Jul 22 2021 07:40:47.403" level="info" msg="Running 10 agent feature(s)"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading avrdmgmt"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading nginxmgmt"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading meta"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading configurator"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading eventsmgr"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading cloudcfgmgmt"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading metrics"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading http-bridge"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading security-events-manager"
time="Jul 22 2021 07:40:47.404" level="info" msg="Loading certsmgmt"
time="Jul 22 2021 07:40:47.404" level="info" msg="Starting AVRD" feature="avrdmgmt"
time="Jul 22 2021 07:40:47.405" level="info" msg="AVRD started (PID: 36)" feature="avrdmgmt"
time="Jul 22 2021 07:40:48.122" level="info" msg="Started all NGINX collectors" feature="metrics"
time="Jul 22 2021 07:40:57.451" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:41:13.554" level="info" msg="Started streaming\t\t\t: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"

Trying to "kubectl delete -f nginx.yaml" and "kubectl apply -f nginx.yaml" for some reason the agent-to-controller communication stops working:

starting nginx ...
waiting for nginx workers ...
2021/07/22 07:41:48 [notice] 7#7: using the "epoll" event method
2021/07/22 07:41:48 [notice] 7#7: nginx/1.19.10 (nginx-plus-r24-p1)
2021/07/22 07:41:48 [notice] 7#7: built by gcc 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 
2021/07/22 07:41:48 [notice] 7#7: OS: Linux 5.8.0-59-generic
2021/07/22 07:41:48 [notice] 7#7: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/07/22 07:41:48 [notice] 7#7: start worker processes
2021/07/22 07:41:48 [notice] 7#7: start worker process 10
updating /etc/controller-agent/agent.conf ...
 ---> using api_key = xxxyyyzzz
 ---> using controller api url = https://nginx-controller.ff.lan:8443/1.4/
 ---> using instance_name = nginx-plus-on-k8s
 ---> using instance group = nginx-k8s
starting controller-agent ...
time="Jul 22 2021 07:41:50.511" level="info" msg="Starting Nginx Controller (Go) Agent. Version: 3.18.1-316464192.release-3-18..." feature="main"
time="Jul 22 2021 07:41:50.515" level="info" msg="Discovered nginxs" count="1" feature="main"
time="Jul 22 2021 07:41:50.535" level="info" msg="Running 10 agent feature(s)"
time="Jul 22 2021 07:41:50.535" level="info" msg="Loading avrdmgmt"
time="Jul 22 2021 07:41:50.535" level="info" msg="Loading cloudcfgmgmt"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading eventsmgr"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading http-bridge"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading metrics"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading meta"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading nginxmgmt"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading certsmgmt"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading configurator"
time="Jul 22 2021 07:41:50.536" level="info" msg="Loading security-events-manager"
time="Jul 22 2021 07:41:50.536" level="info" msg="Starting AVRD" feature="avrdmgmt"
time="Jul 22 2021 07:41:50.536" level="info" msg="AVRD started (PID: 37)" feature="avrdmgmt"
time="Jul 22 2021 07:41:50.558" level="info" msg="Started all NGINX collectors" feature="metrics"
time="Jul 22 2021 07:42:00.635" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:00.654" level="info" msg="Streaming instance 1 ended" feature="configurator"
time="Jul 22 2021 07:42:00.654" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:12.192" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:12.211" level="info" msg="Streaming instance 2 ended" feature="configurator"
time="Jul 22 2021 07:42:12.211" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:18.779" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:18.795" level="info" msg="Streaming instance 3 ended" feature="configurator"
time="Jul 22 2021 07:42:18.796" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:26.593" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:26.610" level="info" msg="Streaming instance 4 ended" feature="configurator"
time="Jul 22 2021 07:42:26.610" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:35.310" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:35.327" level="info" msg="Streaming instance 5 ended" feature="configurator"
time="Jul 22 2021 07:42:35.327" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:47.114" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:42:47.131" level="info" msg="Streaming instance 6 ended" feature="configurator"
time="Jul 22 2021 07:42:47.131" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:42:50.605" level="info" msg="Payload sent." _update="2663" feature="http-bridge"
time="Jul 22 2021 07:43:01.827" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:43:01.852" level="info" msg="Streaming instance 7 ended" feature="configurator"
time="Jul 22 2021 07:43:01.852" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:43:09.791" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:43:09.815" level="info" msg="Streaming instance 8 ended" feature="configurator"
time="Jul 22 2021 07:43:09.815" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:43:22.088" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"
time="Jul 22 2021 07:43:22.106" level="info" msg="Streaming instance 9 ended" feature="configurator"
time="Jul 22 2021 07:43:22.106" level="info" msg="Streaming terminated remotely. attempting to restart" feature="configurator"
time="Jul 22 2021 07:43:32.190" level="info" msg="Initialize streaming from endpoint: https://nginx-controller.ff.lan:8443/1.4/xxxyyyzzz/configs/stream/" feature="configurator"

Manually deleting the "offline" nginx instance from the controller, and deleting and re-applying the deployment yaml makes it work again. Just deleting the nginx instance from the controller doesn't do the trick.

Is my yaml correct? Do you see anything wrong? The controller is 3.18.

Thank you, FF

brianehlert commented 3 years ago

Assuming it is registering under the instance_group properly in Controller. I would verify in your log output that the xxxyyyzzz replacement that you provided is different for each instance. After you delete, I would expect the new pod to have a new machinename and a new uuid. I would remove the STORE_UUID ENV setting I would remove the CONTROLLER_INSTANCE_NAME ENV setting

store_uuid and instnace_name matter for machines. But should not matter for ephemeral pods. At least this is how it is supposed to work.

fabriziofiorucci commented 3 years ago

Thank you, removing STORE_UUID and CONTROLLER_INSTANCE_NAME made it work:

Screenshot_20210722_232751

After restarting the pod it gets listed correctly in the running instances. The previous pod instance switches to offline and stays there: is there any sort of garbage collection to get rid of old instances?

Additionally, testing with Replica > 1 all instances are listed correctly:

2

But under infrastructure / instances / analyzer I get:

3

Only the first instance of the replica is listed. Is there official support for replica > 1 as of today?

Thank you

brianehlert commented 3 years ago

You are using the term 'replica' Does this mean you are using an auto-scale group / scale set?

If so, each machine needs to have a unique hostname - so your image needs to be prepared to ensure that happens. If all members of the set have the same hostname, that would be something the team has to accommodate.

If they were pods in K8s, I know they all get unique hostnames.

And yes, Controller should be cleaning up the orphans within an instance_group.

fabriziofiorucci commented 3 years ago

Thank you, I tried using:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: test
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-apigw
        image: registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2
        ports:
        - containerPort: 80
        - containerPort: 8080
        env:
          - name: ENV_CONTROLLER_API_URL
            value: "https://nginx-controller.ff.lan:8443/1.4/"
          - name: ENV_CONTROLLER_API_KEY
            value: "xxyyzz"
          - name: ENV_CONTROLLER_INSTANCE_GROUP
            value: "nginx-k8s"

hence I got 3 k8s pods, each one running with its own name.

brianehlert commented 3 years ago

The hostname and the uuid should be unique for each, and thus they should look like unique instances in Controller.
That is the expectation.

Quick question, when the image is built is a uuid and hostname being set in the image? I am wondering if that is baked in the image and overriding what should automatically happen.

Checking the logs for each pod individually should show that they are registering / don't identify a duplicate. This is the thread my thoughts are going down.

fabriziofiorucci commented 3 years ago

The image I'm testing was built using:

sudo docker build --build-arg STORE_UUID=True --build-arg CONTROLLER_URL=https://nginx-controller.ff.lan/install/controller-agent --build-arg API_KEY='xxxyyyzzz' -t registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.2 .

All pods logs show nothing abnormal, they are all pushing data to the controller, so at least from a logging perspective everything seems to run smoothly.

Should I try rebuilding that by hardwiring the UUID and the INSTANCE_NAME?

brianehlert commented 3 years ago

I suggest not building your image with STORE_UUID = True That might be the problem with only the first appearing.

brianehlert commented 3 years ago

Do not hardwire hostname, uuid, or instance_name Hardwiring those in the image will cause problems in your scenario.

Those should only be set in machine environments. Or when a single instance needs to always reflect as the same machine in Controller, no matter what lifecycle events happen to it.

fabriziofiorucci commented 3 years ago

I tried rebuilding the image using:

sudo docker build --build-arg CONTROLLER_URL=https://nginx-controller.ff.lan/install/controller-agent --build-arg API_KEY='xxxyyyzzz' -t registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.3 .

and spinning that up using

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: controller-test
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-apigw
        image: registry.ff.lan:31005/nginx-plus-with-nginx-agent:1.3
        ports:
        - containerPort: 80
        - containerPort: 8080
        env:
          - name: ENV_CONTROLLER_API_URL
            value: "https://nginx-controller.ff.lan:8443/1.4/"
          - name: ENV_CONTROLLER_API_KEY
            value: "xxxyyyzzz"
          - name: ENV_CONTROLLER_INSTANCE_GROUP
            value: "nginx-k8s"

it works flawlessly now, even when setting replicas > 1, all instances are accounted for in the analyzer

Thank you!

fabriziofiorucci commented 3 years ago

Hi again,

apparently there's still something missing:

After ~9 hours the two offline instances are still there in the controller: is there anything that should be done to have them pruned? Should this happen automatically and, if so, after how long?

Thank you

brianehlert commented 3 years ago

I would not worry about the pruning. Not right now. The important thing is that you can continue to push configuration changes and have offline instances in your instance_group. The orphan collection might not be in the current release.