kubernetes-sigs / jobset

JobSet: a k8s native API for distributed ML training and HPC workloads
https://jobset.sigs.k8s.io/
Apache License 2.0
139 stars 46 forks source link

JobSet release for ARM? #237

Open vsoch opened 1 year ago

vsoch commented 1 year ago

I'm testing JobSet with an operator, and I saw this error in my operator

2023-07-24T22:04:51Z    ERROR   Reconciler error    {"controller": "hyperqueue", "controllerGroup": "flux-framework.org", "controllerKind": "Hyperqueue", "Hyperqueue": {"name":"hyperqueue-sample","namespace":"flux-operator"}, "namespace": "flux-operator", "name": "hyperqueue-sample", "reconcileID": "a2514367-0003-4f54-bec9-cde6aed4365e", "error": "Internal error occurred: failed calling webhook \"mjobset.kb.io\": failed to call webhook: Post \"https://jobset-webhook-service.jobset-system.svc:443/mutate-jobset-x-k8s-io-v1alpha2-jobset?timeout=10s\": no endpoints available for service \"jobset-webhook-service\""}

Doh! I forgot that JobSet is probably built for x86!

$ kubectl logs -n jobset-system jobset-controller-manager-547857948-d9m82 
exec /manager: exec format error

Would it be possible to have an ARM build / set of manifests?

vsoch commented 1 year ago

heyo! I was wondering if this is something we could look into?

kannon92 commented 1 year ago

I think it is doable. I believe that Kueue supports both.

Looks like its some work in the makefile: https://github.com/kubernetes-sigs/kueue/blob/main/Makefile to tell which platform to build.

vsoch commented 1 year ago

Yes! I've done this for two operators now, if you want another pattern to look at:

Arm is much slower to build, so I only do it on merges. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/.github/workflows/build-deploy.yaml#L12-L37

ahg-g commented 1 year ago

yeah, we can support that, do you want to send a PR that follows the Kueue pattern?

vsoch commented 1 year ago

yeah, we can support that, do you want to send a PR that follows the Kueue pattern?

The lab asked me to stop contributing until we have a corporate CLA (I was using an individual one before) so I definitely could, but with some delay for that to finalize (we are pushing fairly hard for it to be signed)! If there is someone else that wants to do it, go ahead, if not I can when we have that.

kannon92 commented 1 year ago

/help /good-first-issue

k8s-ci-robot commented 1 year ago

@kannon92: This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubernetes-sigs/jobset/issues/237): >/help >/good-first-issue Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
a-hilaly commented 1 year ago

@vsoch Thanks for the links, very insightful. QQ: Do you also use/publish manifests to group multiple image tags (*-arm, *-amd ) under the same tags?

vsoch commented 1 year ago

I don’t use publish manifests - I have a similar workflow that just builds an explicit arm tag (for a push to main or release with the version) and then generates the equivalent CRD yaml file targeting arm. I didn’t want to tangle the builds together or assume they would stay the same (aside from the architecture).

a-hilaly commented 1 year ago

I have a similar workflow that just builds an explicit arm tag

Thanks for the clarification. I'm bringing this conversation because I'm currently reading about container registry manifests, and i think it would be useful (in the context of this project) to push image tags for each of the supported platform + a manifest mapping the images the their respective platforms. This way user will only have to specify the same tag, regardless of the platform, and leave it up the container runtime to pull the right image. I want to hear what folks this this about this solution/approach? Also happy to contribute and help supporting this.

vsoch commented 1 year ago

I’ve developed container registries and contributed to OCI, so I’m familiar. My reasons are primarily the drastic differences in build times. The typical approach is to use this action to build and push both at once (same manifest) and it’s not reasonable to do in the time of a PR. https://drpdishant.medium.com/multi-arch-images-with-docker-buildx-and-qemu-141e0b6161e7 that said, it would work fine to do both like that on merge to main and release, and just one for PR. It’s up to you how you want to maintain your images and CRDs - I want to keep them separate. I don’t think there is a right choice, just different preferences.

a-hilaly commented 1 year ago

Thanks for clarifying again! I have few questions to complete the missing pieces in my mind:

a-hilaly commented 1 year ago

Maybe i brought some confusion with my initial message, but the manifest i'm talking about was more the "container registry manifests" (https://docs.docker.com/engine/reference/commandline/manifest/), not the released manifest.yaml file (that contains CRDs as well)

vsoch commented 1 year ago

How come CRDs come into the game here? My understanding is that this is about building controller binaries and shipping container images.

A container image for a controller is directly tied to a CRD. If the metadata is off or wrong there, it won't install properly. E.g., here is the container reference for my CRD that the user applies (that single file) to do the entire deployment. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/examples/dist/flux-operator.yaml#L1398. The chart is just a more separated variant of that, where the version (or tag) is filled with a variable, e.g., here is the same. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/chart/templates/deployment.yaml#L58

If understand well, building and pushing images separately for each platform is faster than using docker buildx build/push --platforms=$(SOME_PLATFORM) ?

if you use the buildx action and give it two architectures, it's going to take (I think) the sum of both, or at best, the time of the fastest one (if it can somehow build in parallel without losing efficiency). In my operator's case, the difference in times is 6 minutes for x86 and 33 for arm. And (in the scope of things I've built with QEMU and arm) that is really speedy, if you have to do more compilation it can take hours (or go outside of action limits). Here are the two paired, which happens on merge to main / releases. https://github.com/flux-framework/flux-operator/actions/runs/6137092811

vsoch commented 1 year ago

Maybe i brought some confusion with my initial message, but the manifest i'm talking about was more the "container registry manifests" (https://docs.docker.com/engine/reference/commandline/manifest/), not the released manifest.yaml file (that contains CRDs as well)

No I understand. You want to pull one digest/tag and have the registry choose the platform. That happens by way of packaging them together into the same OCI image manifest. That is separate from the CRD yaml manifests (that I tried to distinguish based on name). My other point is that the second set of manifests (for the operator) can have subtle differences that would make sharing the OCI image manifest platform potentially problematic. I don't have a good example yet but a simple one might be that some of the clouds expect a pod selector for arm, which we'd need for one CRD manifest for arm but not the other.

vsoch commented 1 year ago

To be explicitly clear, you can find a good strategy to put both platforms in the same manifest and allow the OCI registry to choose. The complexity, I think, is ensuring that the builds happen in a reasonable time in testing, and if you can do then, ensuring that (forever) there are no differences in the operator manifests to warrant having just one the wrong approach. As I already said, I don't think there is a right answer, but it's a choice depending on the needs / use cases for your operator and choices for CI.

a-hilaly commented 1 year ago

I understand that CRDs can be tightly coupled with the controller version (e.g when it comes to controllers reconciliation logic). Is it accurate to say that the main issue revolves around ensuring compatibility between CRDs (in terms of versioning) and container images, preventing them from diverging?

Additionally, in the case of this project, could there be scenarios where different platforms require distinct "CRDs yaml defintions" to accommodate platform-specific requirements?

vsoch commented 1 year ago

Additionally, in the case of this project, could there be scenarios where different platforms require distinct "CRDs yaml defintions" to accommodate platform-specific requirements?

Yes that's why I have two :) And I mentioned:

I don't have a good example yet but a simple one might be that some of the clouds expect a pod selector for arm, which we'd need for one CRD manifest for arm but not the other.

danielvegamyhre commented 4 days ago

I think it is doable. I believe that Kueue supports both.

Looks like its some work in the makefile: https://github.com/kubernetes-sigs/kueue/blob/main/Makefile to tell which platform to build.

Reviving this issue since I think it would be useful to do this, if anyone wants to give it a shot :)