sustainable-computing-io / kepler-operator

Kepler Operator
Apache License 2.0
26 stars 26 forks source link

Make it possible to adjust more fields in exporter spec like image repository or resources #195

Open BGrasnick opened 1 year ago

BGrasnick commented 1 year ago

Hey,

as of right now, we can only specify the port, nodeSelectors and Tolerations for the kepler Pod when using the kepler-operator. I think it would be great to also be able to adjust the image repository for internal registries and the resources of the containers to help with scheduling.

The question is whether it is better to step by step add possibilities to adjust all the fields that users might want to adjust or go for a more general approach by adding support for PodSpec like they did for the GitLab operator for example.

What do you think?

I would be happy to help and contribute but first it would be good to decide on an approach.

sthaha commented 1 year ago

I think it would be great to also be able to adjust the image repository for internal registries and the resources of the containers to help with scheduling.

The reason we pin the version of kepler in operator is to

For testing and development purpose, a new bundle can be generated pointing to any image using --kepler.image flag or KEPLER_IMAGE env var.

With the podSpec, what parts are you specifically targeting? Do you have an idea about how the spec would look?

  spec:
    exporter:
        < changes > 

Please note that although, the operator is only at Level - 1: Basic Install, it would be nice if we didn't have to introduce changes that we may have to revert later on (breaking changes). In my experience it is better to rely on inputs from the users before adding a feature.

BGrasnick commented 1 year ago

Hey @sthaha, thank you for your reply! Of course I understand that you want to pin the version to make sure everything works as intended! I was rather talking about the repository where the image is coming from. We and probably other users of kepler-operator (at least in the future) would love if we could get the image from an internal registry that mirrors quay. PodSpec would make the whole operator very flexible. Things might break if users change the wrong parameters like the image tag but I think that is a risk they know about if they change that. What I would love as of today would be the following:

  spec:
    exporter:
        image:
          repository: < internal-registry >
        resources:
          requests:
            memory: "<XYZ>Mi"
            cpu: "<XYZ>m"
          limits:
            memory: "<XYZ>Mi"
            cpu: "<XYZ>m"

This would then lead to using the image:

< internal-registry >/sustainable_computing_io/kepler:< tag-set-by-kepler-operator >

where < tag-set-by-kepler-operator > would be "release-0.5.4" right now.

Of course we could just add the two feates "image.repository" and "resources" but who might know what users might want to change in the future :) That's why I was suggesting PodSpec. For now, I would be more than happy with only the two additions though!

sthaha commented 1 year ago

My argument against adding the image/ repo still stands :( . Anyone with the right to create kepler (need not be an admin only needs the rbac to create kepler) will be able to run arbitrary image. However, if its of any help ... we did notice that changing the image in the pod created by DaemonSet controller does not get reverted back.

Lets add the resources for sure! I am also thinking all pod related spec should go under a different field itself so that we separate deployment config from kepler config.


spec:
  exporter:
     spec | pod | deployment:  # pick one  
         port: 8080
        resources:

    metrics:
       cgroup:
           enabled: true

Note: port got moved to pod | spec | deployment

cc: @husky-parul @vimalk78 any thoughts?

BGrasnick commented 1 year ago

Wasn't sure whether to create a new issue or just add it here, but think it could also belong here:

We are running OpenShift on AWS and we get only 0 metrics when using the current default image release-0.6.1 (bcc) but when switching to release-0.6.1-libbpf we finally get some metrics. If I understand correctly, libbpf will be the default starting with kepler 0.7.0 but I don't know when this will be released. Also I don't know if bcc will be deprecated in the future or is here to stay and in which scenarios it may be beneficial to use bcc. But if both options continue to exist in the future, what do you think about adding a switch to let users decide which to use? Something like this (for the future starting with kepler 0.7.0):

spec:
  exporter:
     deployment:
       useBcc: false # libbpf is used by default

=> then if this is set we can change the image used by the kepler-exporter DaemonSet to release-x.y.z-bcc

sthaha commented 1 year ago

@BGrasnick I would rather use an enum than a boolean because it keeps the options open. I wouldn't also tie this to a deployment, rather it is part of exporter configuration.

spec
  exporter: 
      method:  bcc | libbpf | something | else | in | future