sustainable-computing-io / kepler-operator

Kepler Operator
Apache License 2.0
25 stars 26 forks source link

Feat: add method field to exporter spec #287

Closed BGrasnick closed 10 months ago

BGrasnick commented 10 months ago

Addresses #195.

Adds a new enum to kepler exporter spec that allows choosing which method/image to use: bcc or libbpf.

Needs to be changed once kepler 0.7.0 is released and default will be libbpf.

Testing was a little challenging and I am not super happy with the solution and open for suggestions. As of now, the RELATED_IMAGE_KEPLER var gets set for Config.Image and determines the image used. This is done in main.go lines 78 & 79. As this was needed for the test, I added lines 111-113 in exporter.go to set a default in case Config.Image is empty. However, because of this and the test, now the image & its version is hardcoded in 3 different locations which is not good. I am open to any help here :)

Was tested manually on an OpenShift 4.11 cluster.

@sthaha: any thoughts?

sthaha commented 10 months ago

@BGrasnick , thank you for the PR. @vimalk78 is already working on enabling something similar but using an (undocumented) annotation since we know this (hack) is going to be short lived. The reason I call it a hack is because after 0.7, we will have to deprecate the method unless bcc is going to be supported for the foreseeable future.

BGrasnick commented 10 months ago

Ah ok nevermind then! I wasn't sure if bcc will be deprecated or still be used in the future and thought it might be helpful also in the future because of your answer in #195. When will this workaround come? We want to roll out kepler but need this or a release of 0.7.0 for kepler to work as we depend on libbpf...

sthaha commented 10 months ago

@BGrasnick , it's good to validate our assumptions :)

@marceloamaral @rootfs, Could you please confirm if bcc will be deprecated? In this PR @BGrasnick adds a field in exporter spec to allow choosing between bcc and libbpf, depending on which, operator will deploy appropriate image.

We (@vimalk78 ) are currently working on a solution to allow this through a hack. We intent to add the ability to honour an annotation instead of a field in the spec, so that we don't have to delete the field later on when bcc won't be used anymore - which is causes API breakage.

But, this is all made under the assumption that there will only be libbpf image in future (from 0.7). Is that the right assumption to make? or would continue to use bcc, or some other mechanism later?

BGrasnick commented 10 months ago

@sthaha yeah definitely sounds good to check again what is planned for the future.

My team and organization want to deploy kepler to our cloud/VM-based OpenShift clusters ASAP because we love the integrated dashboards and how easy use to use the operator is. We want to make this available for all teams on our clusters but cannot do it right now because we need to use libbpf instead of bcc. And there is neither a release date for kepler 0.7.0 when libbpf will become default nor a preview for it as far as I know. Do you know of any other way how we could use the libbpf version of kepler with the operator right now?

sthaha commented 10 months ago

My team and organization want to deploy kepler to our cloud/VM-based OpenShift clusters ASAP because we love the integrated dashboards and how easy use to use the operator is. We want to make this available for all teams on our clusters but cannot do it right now because we need to use libbpf instead of bcc.

@BGrasnick , do note there always exists the possibility of creating your own bundle with any kepler image and use operator-sdk to deploy the bundle.

E.g. here is how I create my bundle that uses kepler libbpf bundle that gets pushed to quay as - quay.io/sthaha/kepler-operator-bundle:0.0.0-dev

make operator-build bundle bundle-build operator-push bundle-push IMG_BASE=quay.io/sthaha VERSION=0.0.0-dev KEPLER_IMG=quay.io/sustainable_computing_io/kepler:release-0.6.1-libbpf

There exists a HACK as well (but use it as your own risk)

  1. Use community catalog to deploy operator
  2. Deploy Kepler
  3. Release all kepler - pod images as follows
    for x in $(kubectl get pods -n openshift-kepler-operator -l app.kubernetes.io/component=exporter -l app.kubernetes.io/managed-by=kepler-operator -o name); do
    oc set image -n openshift-kepler-operator $x kepler-exporter=quay.io/sustainable_computing_io/kepler:release-0.6.1-libbpf ;
    done

    We have observed that daemonset controllers do not revert back the image.

marceloamaral commented 10 months ago

We might have only libbpf in the future, there is no reason, as far as I know, for maintaining duplicated code. However, I am not sure when we will remove the bcc. In the upcoming release, we plan to set libbpf as the default, and perhaps in the subsequent release, we may consider removing bcc.

Are there any use cases that I might not be aware of that require bcc? If not, it's probably unnecessary to enable it in the operator.

sthaha commented 10 months ago

@marceloamaral , thank you for taking time to answer.

Are there any use cases that I might not be aware of that require bcc?

Not that I am aware of but definitely there are more use-cases for libbpf than bcc. This also makes me wonder if it would be possible for us to make a release of kepler 0.7.0 that only switches to libbpf provided libbpf transition is mature enough. This would allow us to make another operator release which deploys libbpf based kepler allows users of openshift 4.11+ to try out kepler and provide feedback.

If not, it's probably unnecessary to enable it in the operator.

@BGrasnick, I agree with Marcelo's comment. Since this issue is short lived, the hack @vimalk78 is working on should suffice.

BGrasnick commented 10 months ago

@marceloamaral thanks for your answer!

@sthaha yeah of course that makes sense! If kepler doesn't need/support bcc in the future then there is no meaning in implementing this functionality of method choosing. However, as you outlined it would be great to fast track a release of kepler and kepler-operator with libbpf. I don't know if other people already use kepler on OpenShift + AWS with bcc but for us it does not work...

sthaha commented 10 months ago

@BGrasnick I have started a discussion about switching to libbpf - https://github.com/sustainable-computing-io/kepler/discussions/1027

BGrasnick commented 10 months ago

@sthaha That's a great idea! Alternatively, if it does not match up with the release plan for kepler, how about releasing a new kepler-operator 0.10.0 or 0.9.1 where we use release-0.6.1-libbpf of kepler?

sthaha commented 10 months ago

how about releasing a new kepler-operator 0.10.0 or 0.9.1 where we use release-0.6.1-libbpf of kepler?

I am happy with that proposal if we can validate in kepler CI that both images produce the same / comparable values for all metrics.

sthaha commented 10 months ago

@BGrasnick , I am closing this in favour of https://github.com/sustainable-computing-io/kepler-operator/pull/293