Environment variables are not set if the broker pod is provisioned quickly

jiayihu commented 3 years ago

Describe the bug It seems that the environment variables are not properly set in the broker container if the latter is quickly instantiated. On the other hand, if a strong resource limit is put on the container, then they are correctly available in the env variables.

Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s] Kind 0.10.0 with K8s v1.20.2

To Reproduce Steps to reproduce the behavior:

Create values.yaml with the following values:

debugEcho:
configuration:
enabled: true
brokerPod:
  image:
    repository: nginx
    tag: stable-alpine
brokerProperties:
  test: okay
  hello: kate
discovery: 
enabled: true

Delete the resource limits from debug-echo-configuration.yml.

resources:
requests:
{{`"{{PLACEHOLDER}}"`}} : "1"
memory: {{ .Values.debugEcho.configuration.brokerPod.resources.memoryRequest }}
cpu: {{ .Values.debugEcho.configuration.brokerPod.resources.cpuRequest }}
limits:
{{`"{{PLACEHOLDER}}"`}} : "1"
memory: {{ .Values.debugEcho.configuration.brokerPod.resources.memoryLimit }}
cpu: {{ .Values.debugEcho.configuration.brokerPod.resources.cpuLimit }}

Run helm package helm and helm install akri ./akri-0.6.8.tgz --values values.yaml
Run printenv within the broker container

Expected behavior Env variables should be always available.

Logs (please share snips of applicable logs) kubectl describe akrii akri-debug-echo-a19705 shows that the properties are properly set to the instance:

Spec:
  Broker Properties:
    DEBUG_ECHO_DESCRIPTION:  foo1
    Hello:                   kate
    Test:                    okay
  Configuration Name:        akri-debug-echo
  Device Usage:
    akri-debug-echo-a19705-0:
    akri-debug-echo-a19705-1:
  Nodes:
    kind-control-plane
  Shared:  true

kubectl exec -ti kind-control-plane-akri-debug-echo-a19705-pod -- printenv doesn't list the akri env variables.

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=kind-control-plane-akri-debug-echo-a19705-pod NGINX_VERSION=1.20.1 NJS_VERSION=0.5.3 PKG_RELEASE=1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 KUBERNETES_SERVICE_HOST=10.96.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT=tcp://10.96.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp TERM=xterm HOME=/root

Additional notes I can't reproduce the bug even if I use high resource limits:

resources:
  # memoryLimit defines the maximum amount of RAM this Pod can consume.
  memoryLimit: 8Gi
  # cpuLimit defines the maximum amount of CPU this Pod can consume.
  cpuLimit: 8000m

The broker container still takes ~8s to be running. On the other hand, deleting the resource limits altogether result in ~4s for the container to be running. In that case, the bug is always reproducible.

bfjelds commented 3 years ago

interesting bug ... i think that this makes sense from a code perspective (i think we populate the envs during allocate ... without the PLACEHOLDER limit/request, there is no allocate).

kate-goldenring commented 3 years ago

@jiayihu, I reproed your issue deleting the entire resources section of the debugEcho configuration. Like @bfjelds mentioned, the PLACEHOLDER is needed. Without it, the broker pods will not request the debug echo resources. The environment variables are set as a result of a device plugin allocate request (from kubelet to the Agent) which is never called without requesting the debug echo resources.

Can you try you scenario again deleting the CPU/mem resources but leaving the resource requirements key placeholder? That solved the issue for me.

@bfjelds, why do we need to check for the placeholder? was there some restriction on inserting resources into a podspec and the only option was to replace (a placeholder)?

bfjelds commented 3 years ago

having the placeholder was intended to allow people to use whatever value they wanted. we default to 1 here:

{{`"{{PLACEHOLDER}}"`}} : "1"

but someone might want to use different values (maybe their broker makes 2 connections to the device?).

jiayihu commented 3 years ago

Can you try you scenario again deleting the CPU/mem resources but leaving the resource requirements key placeholder? That solved the issue for me.

Yup it works for me too.

Without it, the broker pods will not request the debug echo resources. The environment variables are set as a result of a device plugin allocate request (from kubelet to the Agent) which is never called without requesting the debug echo resources.

That is a very obscure behaviour from an external POV. Did I miss it from the doc again? 😄 I didn't understand what PLACEHOLDER was used for, but since it's under resources I just removed it. Generally speaking, I'm not sure it makes much sense to ask the specify the resource limits. Default values can easily be to restrictive, as in my CoAP broker case which has to handle WebSockets. On the other hand, I don't know which appropriate limits should I put. I think it's not trivial to evaluate, especially the CPU time.

having the placeholder was intended to allow people to use whatever value they wanted. we default to 1 here:

I still don't understand what PLACEHOLDER does. Could you make an example of how one should use it for a protocol?

kate-goldenring commented 3 years ago

I still don't understand what PLACEHOLDER does. Could you make an example of how one should use it for a protocol?

This is essentially the section where the controller inserts the Akri resource request for the broker. For example for an onvif broker PodSpec, the controller would replace the placeholder so the result is something like this:

  resources:
    limits:
      akri.sh/onvif-camera-<id>: "1"
    requests:
      akri.sh/onvif-camera-<id>: "1"

The placeholder is there in case you wanted to reserve 2 onvif cameras for a broker. You would make the value of the place holder "2" and then the following would be set as resource requests in the broker podspec by the contoller:

  resources:
    limits:
      akri.sh/onvif-camera-<id>: "2"
    requests:
      akri.sh/onvif-camera-<id>: "2"

Right now, it is not a scenario we have documented because we do not have use cases for it. It is not a value we intend users or developers to currently be interacting with; however, it does seem that it could be named differently.

Also, maybe the controller should check for the PLACEHOLDER (to see if the user is requesting a quantity other than 1), but if it is not there, still insert the resource request. Also, our configuration validating webhook does not check for the PLACEHOLDER, so if that is our grounds for valid configurations then the PLACEHOLDER should not be required.

Generally speaking, I'm not sure it makes much sense to ask the specify the resource limits. Default values can easily be to restrictive

Would adding values to the helm templates to not include the requests and limits be useful? Something like debugEcho.configuration.brokerPod.resources.memoryRequest.enabled=false.

kate-goldenring commented 3 years ago

@jiayihu is further action needed for this issue or can I close it?

jiayihu commented 3 years ago

I think you can close it if PLACEHOLDER is working as intended and no change is needed. I have no further questions or issues with it

project-akri / akri

Environment variables are not set if the broker pod is provisioned quickly #347