operator-framework / operator-registry

Operator Registry runs in a Kubernetes or OpenShift cluster to provide operator catalog data to Operator Lifecycle Manager.
Apache License 2.0
212 stars 248 forks source link

Categories validation works on a host, not in a container #538

Open J0zi opened 3 years ago

J0zi commented 3 years ago

Bug Report

Categories validation works on a host, not in a container.

What did you do?

Manually built operator-sdk operator-sdk version: "v1.2.0-42-g7a741929", commit: "7a741929485a7122d88f1ff22000093ddde13ab0", kubernetes version: "v1.19.4", go version: "go1.15.6", GOOS: "linux", GOARCH: "amd64"

Run OPERATOR_BUNDLE_CATEGORIES=/tmp/community-operators/categories.json /tmp/operator-test/bin/operator-sdk bundle validate --verbose quay.io/openshift-community-operators/eclipse-che:v7.22.2 --select-optional suite=operatorframework -b podman in a container

What did you expect to see?

All validation tests have completed successfully

What did you see instead? Under which circumstances?

podman cp failed despite cp is working for me manually

OPERATOR_BUNDLE_CATEGORIES=/tmp/community-operators/categories.json /tmp/operator-test/bin/operator-sdk bundle validate --verbose quay.io/openshift-community-operators/eclipse-che:v7.22.2  --select-optional suite=operatorframework -b podman
DEBU[0000] Debug logging is set                         
INFO[0000] Unpacking image layers                       
DEBU[0000] Pulling and unpacking container image         bundle-dir=/tmp/bundle-808480236 container-tool=podman
INFO[0000] running /usr/bin/podman pull quay.io/openshift-community-operators/eclipse-che:v7.22.2 
INFO[0004] running podman create                        
DEBU[0004] [podman create quay.io/openshift-community-operators/eclipse-che:v7.22.2 ] 
INFO[0004] running podman cp                            
DEBU[0004] [podman cp time="2020-12-15T10:44:09Z" level=error msg="unable to write pod event: \"write unixgram @0004c->/run/systemd/journal/socket: sendmsg: no such file or directory\""
6b044ca6255c328c352fc133827d69d88d491bd9ec85afaa7488f4b03544f0e2:/. /tmp/bundle-808480236] 
ERRO[0004] Error: invalid arguments time="2020-12-15T10:44:09Z" level=error msg="unable to write pod event: \"write unixgram @0004c->/run/systemd/journal/socket: sendmsg: no such file or directory\""
6b044ca6255c328c352fc133827d69d88d491bd9ec85afaa7488f4b03544f0e2:/., /tmp/bundle-808480236 you must use just one container 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13b1166]

goroutine 1 [running]:
github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/bundle/validate/internal.(*Result).prepare(0x0, 0x15, 0x7ffe598cb85a)
    /tmp/operator-sdk-source/internal/cmd/operator-sdk/bundle/validate/internal/result.go:132 +0x26
github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/bundle/validate/internal.(*Result).PrintWithFormat(0x0, 0x211d7ab, 0x4, 0x6, 0x211d7ab)
    /tmp/operator-sdk-source/internal/cmd/operator-sdk/bundle/validate/internal/result.go:151 +0x2f
github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/bundle/validate.NewCmd.func1(0xc0008edb80, 0xc0003c0120, 0x1, 0x6, 0x0, 0x0)
    /tmp/operator-sdk-source/internal/cmd/operator-sdk/bundle/validate/cmd.go:104 +0x3ca
github.com/spf13/cobra.(*Command).execute(0xc0008edb80, 0xc0003c00c0, 0x6, 0x6, 0xc0008edb80, 0xc0003c00c0)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003b4b00, 0x244b1a0, 0xc0001a2978, 0x212b18c)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
sigs.k8s.io/kubebuilder/v2/pkg/cli.cli.Run(...)
    /root/go/pkg/mod/sigs.k8s.io/kubebuilder/v2@v2.3.2-0.20201211222127-503ba3b7e4ad/pkg/cli/cli.go:494
github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/cli.Run(0xc000114058, 0x0)
    /tmp/operator-sdk-source/internal/cmd/operator-sdk/cli/cli.go:51 +0x38
main.main()
    /tmp/operator-sdk-source/cmd/operator-sdk/main.go:28 +0x25
[root@6b26ac836973 tmp]# /tmp/operator-test/bin/operator-sdk version
operator-sdk version: "v1.2.0-42-g7a741929", commit: "7a741929485a7122d88f1ff22000093ddde13ab0", kubernetes version: "v1.19.4", go version: "go1.15.6", GOOS: "linux", GOARCH: "amd64"

Environment

Operator type:

language go

Kubernetes cluster type:

no, just docker container executing podman commands from operator-sdk

container run command docker run -it --rm -e STORAGE_DRIVER=vfs --privileged quay.io/operator_testing/operator-test-playbooks:jtest bash

categories.json https://github.com/operator-framework/community-operators/blob/master/categories.json

$ operator-sdk version

operator-sdk version: "v1.2.0-42-g7a741929", commit: "7a741929485a7122d88f1ff22000093ddde13ab0", kubernetes version: "v1.19.4", go version: "go1.15.6", GOOS: "linux", GOARCH: "amd64"

manually compiled master

$ go version (if language is Go)

go1.15.6

$ kubectl version

no

Possible Solution

podman cp fails using some hash. For me podman cp works with specific file, not hash. Not sure if this information helps.

Additional context

podman cp is working manually:

[root@6b26ac836973 tmp]# podman create quay.io/openshift-community-operators/eclipse-che:v7.22.2 sleep 10
ERRO[0000] unable to write pod event: "write unixgram @0004e->/run/systemd/journal/socket: sendmsg: no such file or directory" 
85ccf0b1b2f203e013c0e8317b8d10aacf14f3a3698367a21533e460efdba1a9
[root@6b26ac836973 tmp]# podman ps -a
CONTAINER ID  IMAGE                                                      COMMAND   CREATED         STATUS   PORTS   NAMES
85ccf0b1b2f2  quay.io/openshift-community-operators/eclipse-che:v7.22.2  sleep 10  38 seconds ago  Created          jovial_hertz
6b044ca6255c  quay.io/openshift-community-operators/eclipse-che:v7.22.2            2 minutes ago   Created          crazy_kirch
[root@6b26ac836973 tmp]# podman cp jovial_hertz:/metadata/annotations.yaml .
ERRO[0000] unable to write pod event: "write unixgram @00050->/run/systemd/journal/socket: sendmsg: no such file or directory" 
ERRO[0000] unable to write pod event: "write unixgram @00050->/run/systemd/journal/socket: sendmsg: no such file or directory" 
[root@6b26ac836973 tmp]# cat annotations.yaml 
annotations:
  operators.operatorframework.io.bundle.channel.default.v1: stable
  operators.operatorframework.io.bundle.channels.v1: stable
  operators.operatorframework.io.bundle.manifests.v1: manifests/
  operators.operatorframework.io.bundle.mediatype.v1: registry+v1
  operators.operatorframework.io.bundle.metadata.v1: metadata/
  operators.operatorframework.io.bundle.package.v1: eclipse-che
[root@6b26ac836973 tmp]# 

The same output using opm

Here is an opm output Version: version.Version{OpmVersion:"v1.15.3", GitCommit:"9e92474", BuildDate:"2020-12-03T18:34:29Z", GoOs:"linux", GoArch:"amd64"}

OPERATOR_BUNDLE_CATEGORIES=/tmp/community-operators/categories.json ./opm alpha bundle validate --tag quay.io/openshift-community-operators/eclipse-che:v7.22.2 --optional-validators=operatorhub -b podman
INFO[0000] Create a temp directory at /tmp/bundle-907446637  container-tool=podman
DEBU[0000] Pulling and unpacking container image         container-tool=podman
INFO[0000] running /usr/bin/podman pull quay.io/openshift-community-operators/eclipse-che:v7.22.2  container-tool=podman
INFO[0005] running podman create                         container-tool=podman
DEBU[0005] [podman create quay.io/openshift-community-operators/eclipse-che:v7.22.2 ]  container-tool=podman
INFO[0005] running podman cp                             container-tool=podman
DEBU[0005] [podman cp time="2020-12-16T08:46:36Z" level=error msg="unable to write pod event: \"write unixgram @00062->/run/systemd/journal/socket: sendmsg: no such file or directory\""
4c2b1b836f3e53d7aadcd6425e4d1b8c18c4d2b9ef862a984aee04a48d2a39b6:/. /tmp/bundle-907446637]  container-tool=podman
ERRO[0005] Error: invalid arguments time="2020-12-16T08:46:36Z" level=error msg="unable to write pod event: \"write unixgram @00062->/run/systemd/journal/socket: sendmsg: no such file or directory\""
4c2b1b836f3e53d7aadcd6425e4d1b8c18c4d2b9ef862a984aee04a48d2a39b6:/., /tmp/bundle-907446637 you must use just one container  container-tool=podman
Error: error copying container directory Error: invalid arguments time="2020-12-16T08:46:36Z" level=error msg="unable to write pod event: \"write unixgram @00062->/run/systemd/journal/socket: sendmsg: no such file or directory\""
4c2b1b836f3e53d7aadcd6425e4d1b8c18c4d2b9ef862a984aee04a48d2a39b6:/., /tmp/bundle-907446637 you must use just one container
: exit status 125
Usage:
  opm alpha bundle validate [flags]

Examples:
$ opm alpha bundle validate --tag quay.io/test/test-operator:latest --image-builder docker

Flags:
  -h, --help                         help for validate
  -b, --image-builder string         Tool used to pull and unpack bundle images. One of: [none, docker, podman] (default "docker")
  -o, --optional-validators string   Specifies optional validations to be run. One or more of: [operatorhub, bundle-objects]
  -t, --tag string                   The path of a registry to pull from, image name and its tag that present the bundle image (e.g. quay.io/test/test-operator:latest)

Global Flags:
      --skip-tls   skip TLS certificate verification for container image registries while pulling bundles or index
J0zi commented 3 years ago

@gallettilance @dinhxuanvu it not only blocks new categories to be added but we are unable to use the latest opm/operator-sdk in our pipelines. We are running operator-sdk in container which works for 0.18.2 https://github.com/operator-framework/community-operators/pull/2941/checks?check_run_id=1664566934#step:3:495 but fails for 1.3.0 https://github.com/operator-framework/community-operators/pull/2941/checks?check_run_id=1662932005. Seems that functionality to run opm/operator-sdk in container was accidental removed or broken. Thank you.

J0zi commented 3 years ago

we found -b none working for sdk :tada: Need to test it more.

J0zi commented 3 years ago

Despite -b none works in case of sdk against quay registry, issue still persist accessing local kind registry from a container.

J0zi commented 3 years ago

We can confirm, that -b podman was working in very old version 0.18.2 but now not working at all. We are executing an sdk command on a container against local kind registry on the underlying host.

exdx commented 3 years ago

Could you confirm the issue persists on the latest versions of opm or the SDK, and with an alternate container tool (such as docker)? If so, we definitely want to address on the opm side.

We want to attempt to reproduce by running the docker run -it --rm -e STORAGE_DRIVER=vfs --privileged quay.io/operator_testing/operator-test-playbooks:jtest bash command, seems like there could be a real issue, but there could be multiple things going on.