hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

Detect supported architectures for docker containers #18082

Open Elara6331 opened 1 year ago

Elara6331 commented 1 year ago

Proposal

I run a Nomad cluster with amd64, arm64, and riscv64 nodes. If I try to use a docker image that only supports amd64, nomad will sometimes schedule it on one of the non-amd64 nodes, which causes an exec format error. This can be fixed by manually adding a constraint for the CPU architecture. If possible, I'd like nomad to be able to automatically detect which platforms are supported by reading the manifest, and dynamically add constraints to the job based on that. The docker command allows you to inspect manifests using docker manifest inspect --verbose <image>.

Use-cases

This feature would simplify the process of running docker images on multi-architecture clusters

jrasell commented 1 year ago

Hi @Elara6331 and thanks for raising this issue.

This would be a great addition, but there are some current architectural patterns within Nomad that would make this difficult/impossible. The biggest one being that the Nomad servers accept job registrations but are not expected to have Docker available nor have any current interaction with it. I do think this would be a good idea to have in some form though, so will move this to our backlog. If other community members have ideas, thoughts, or simply want to +1 this issue, please do.

Elara6331 commented 1 year ago

I've done some research on this, and Docker itself just uses an API to get the manifests, so it doesn't need to be installed to get the information. Here's a simple program I made that just prints out all the supported platforms of an image or manifest using github.com/google/go-containerregistry:

package main

import (
    "os"
    "fmt"

    "github.com/google/go-containerregistry/pkg/authn"
    "github.com/google/go-containerregistry/pkg/name"
    "github.com/google/go-containerregistry/pkg/v1/remote"
)

func main() {
    if len(os.Args) < 2 {
        fmt.Printf("Usage: %s <ref>\n", os.Args[0])
        os.Exit(1)
    }

    ref, err := name.ParseReference(os.Args[1])
    if err != nil {
        panic(err)
    }

    desc, err := remote.Head(ref, remote.WithAuthFromKeychain(authn.DefaultKeychain))
    if err != nil {
        panic(err)
    }

    if desc.MediaType.IsIndex() {
        index, err := remote.Index(ref, remote.WithAuthFromKeychain(authn.DefaultKeychain))
        if err != nil {
            panic(err)
        }

        im, err := index.IndexManifest()
        if err != nil {
            panic(err)
        }

        for _, m := range im.Manifests {
            if m.Platform.Variant != "" {
                fmt.Printf("%s/%s/%s\n", m.Platform.OS, m.Platform.Architecture, m.Platform.Variant)
            } else {
                fmt.Printf("%s/%s\n", m.Platform.OS, m.Platform.Architecture)
            }
        }
    } else if desc.MediaType.IsImage() {
        img, err := remote.Image(ref, remote.WithAuthFromKeychain(authn.DefaultKeychain))
        if err != nil {
            panic(err)
        }

        cf, err := img.ConfigFile()
        if err != nil {
            panic(err)
        }

        if cf.Variant != "" {
            fmt.Printf("%s/%s/%s\n", cf.OS, cf.Architecture, cf.Variant)
        } else {
            fmt.Printf("%s/%s\n", cf.OS, cf.Architecture)
        }
    }
}

Here's the output for some of the images I tested it with:

/ # ./reg golang
linux/amd64
linux/arm/v5
linux/arm/v7
linux/arm64/v8
linux/386
linux/mips64le
linux/ppc64le
linux/s390x
windows/amd64
windows/amd64
/ # ./reg gitea.elara.ws/elara6331/webserver
linux/amd64
linux/arm64
linux/riscv64
/ # ./reg gitea.elara.ws/elara6331/webserver:amd64
linux/amd64