knative / build

A Kubernetes-native Build resource.
Apache License 2.0
575 stars 159 forks source link

Issues in building knative on Power PC, ppc64le architecture #514

Open seth-priya opened 5 years ago

seth-priya commented 5 years ago

I am trying to build knative on ppc64le and wanted to understand the general requirements for building it (along with all dependent images) from scratch.

I have been referring to the DEVELOPMENT.md and could build the base (build-base) image ko.local/github.com/knative/build/build-base for ppc64le, however looks like some of the other required images like git-init and creds-init are either not being built locally (i.e. are pulled from gcr.io) or are being used using incorrect base images, as they are still showing amd64 as architecture (in docker image inspect).

I could not find any Dockerfiles in the repo that could be directly used to build the other images (apart from build base) and am trying to use ./hack/release.sh --nopublish --skip-tests --notag-release

to get the images.

Is there another way to force all the images under build and serving to be built locally? Is there some setting/configuration that I might be missing?

Any guidance on this would be of great help, thanks in advance!

jonjohnsonjr commented 5 years ago

This is probably my fault because I haven't finished adding proper manifest list support to ko.

The container images are built using ko, which defaults to using gcr.io/distroless/base:latest here.

You can try overriding the base image to use something that works with ppc64le, but I'm not sure where the problem lies.

There may be another issue here where we build the go binaries. It looks like we don't yet hardcode GOARCH=amd64, so it may be possible to export GOARCH=ppc64le to build for the right architecture.

I'd like to eventually fix this by adding proper manifest list support to ko: https://github.com/google/go-containerregistry/issues/333

We might be able to fix this use case with a smaller change in the meantime. Let me know if any of that helps or if I can help with unblocking anything.

seth-priya commented 5 years ago

Thanks @jonjohnsonjr for all your suggestions. This really helps with the understanding. I will try making the changes, as per your suggestions and see how it goes, will keep you posted.

seth-priya commented 5 years ago

Update - I was able to build all the docker images for ppc64le locally, after overriding the base image on .ko.yaml in build and serving to use the equivalent ppc64le image (again built locally). I had to additionally update the baseImageOverrides section to use the correct ppc64le image for creds-init and git-init (after pushing it to a local registry).

I have the following ppc64le images now ko.local/github.com/knative/build/cmd/creds-init ko.local/github.com/knative/serving/cmd/queue
ko.local/github.com/knative/build/cmd/nop ko.local/github.com/knative/build/cmd/controller ko.local/github.com/knative/serving/cmd/activator ko.local/github.com/knative/serving/cmd/controller
ko.local/github.com/knative/build/build-base localhost:5000/ko.local/github.com/knative/build/build-base ko.local/github.com/knative/serving/cmd/autoscaler ko.local/github.com/knative/serving/cmd/webhook
ko.local/github.com/knative/build/cmd/git-init ko.local/github.com/knative/build/cmd/webhook

Next, I am planning to follow the instructions here https://github.com/knative/docs/blob/master/install/Knative-with-any-k8s.md

after updating the release.yaml file locally in an attempt to install knative. I already have the istio pods deployed and up and running...

seth-priya commented 5 years ago

I was trying to push the ko.local/github.com/knative/* images using docker push to a local registry using docker push, however that is failing with error like

open /var/lib/docker/devicemapper/mnt/07eec79cee35a5ee0b0b9509d626e699db8afb26ab7e6ac934fa810825c16193/rootfs/var/run/ko/HEAD: no such file or directory

Checking on this...

Next I updated the release.yaml file to use local images, and could apply it, however not all the pods are coming up correctly, debugging the issues (could be some differences in the docker images) ...

jonjohnsonjr commented 5 years ago

For some context, that file is a symlink that points to the current git commit so that it can be used in logs (I think?).

For example: https://github.com/knative/build/blob/master/cmd/controller/kodata/HEAD

Was added here: https://github.com/knative/pkg/pull/158

If you're building these images outside of a git repo, I could imagine that failing, but it might also be a platform difference? Not sure if that helps... maybe a clue towards fixing it :)

seth-priya commented 5 years ago

I am building the images from the git repo, so that does not seem to be the problem, tried using tag v0.2.0 that does not have the changes for HEAD, however got LICENSE related error with that.

Finally, could get this (build and push the images to the local registry) to work on another Power system , so looks like it is not a difference in the platform either, rather something else in the environment, right now, not sure what it could be, at this stage.

Docker version is also not an issue as it works with the same version on Intel as well.

Setting up remote registry to host the image on the system it works, to attempt to move ahead with the deployment ..

seth-priya commented 5 years ago

@jonjohnsonjr some progress on this today - after setting up remote registry, pushing the knative build/serving docker images there, updating release.yaml in accordance and applying it. this is what we have

kubectl get pods --namespace knative-serving NAME READY STATUS RESTARTS AGE controller-66f94dbf98-rjn2l 1/1 Running 0 39m webhook-568ff6fb94-lmmzl 1/1 Running 0 39m

kubectl get pods --namespace knative-build NAME READY STATUS RESTARTS AGE build-controller-6ddc9d64cb-6znfc 1/1 Running 0 40m build-webhook-859b8599b5-dc275 1/1 Running 0 40m

kubectl get pods --namespace knative-monitoring NAME READY STATUS RESTARTS AGE grafana-7549795fd4-4x7lj 1/1 Running 0 2h kibana-logging-68d7697687-ssbjx 1/1 Running 0 38m kube-state-metrics-7c7b459dfb-5vpp7 3/4 CrashLoopBackOff 12 38m prometheus-system-0 1/1 Running 0 38m prometheus-system-1 1/1 Running 0 38m

Checking on kube-state-metrics, could be due to differences in images for addon-resizer on Intel vs Power Error syncing pod d7882c48-14c3-11e9-83d7-525400891221 ("kube-state-metrics-7f7dd967fc-gc7gs_knative-monitoring(d7882c48-14c3-11e9-83d7-525400891221)"), skipping: failed to "StartContainer" for "addon-resizer" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=addon-resizer pod=kube-state-metrics-7f7dd967fc-gc7gs_knative-monitoring(d7882c48-14c3-11e9-83d7-525400891221)"

jonjohnsonjr commented 5 years ago

Just to clarify -- is the target cluster you're trying to deploy to also PowerPC?

Looks like we're failing to pull this: https://github.com/knative/serving/blob/114ee46c575df605fd38a94f2fe1c32107f30b2d/third_party/config/monitoring/metrics/prometheus/kubernetes/kube-state-metrics.yaml#L154-L155

Indeed it is amd64: $craneconfig k8s.gcr.io/addon-resizer:1.7 |jq.architecture "amd64"

Under gcr.io/google-containers/addon-resizer-ppc64le, there's only one image, gcr.io/google-containers/addon-resizer-ppc64le:2.1, which might work if you replace k8s.gcr.io/addon-resizer:1.7 with k8s.gcr.io/addon-resizer-ppc64le:2.1, but it's hard to say :man_shrugging:.

Interestingly, that claims to be amd64 as well:

$ crane config gcr.io/google-containers/addon-resizer-ppc64le:2.1 | jq .architecture
"amd64"

That might not matter, but I believe it's a bug in whatever is producing these images :man_facepalming:.

If you don't care about monitoring I think you can just skip it.

cc @mdemirhan any context for where the monitoring yaml comes from? We probably want to be using a newer tag (2.1) and figure out how to make that tag point to a manifest list to support more platforms.

The author doesn't seem to work at google anymore, so it's going to be a bit of a challenge to figure out what produces these images :/

$ crane config gcr.io/google-containers/addon-resizer:2.1 | jq .author -r
Quintin Lee "qlee@google.com"
seth-priya commented 5 years ago

Thanks for checking and providing your comments and feedback!!

Yes, the target cluster is PowerPC as well and since most of the images that were being referenced from release.yaml were not having multiarch manifest, I have had to replace those with (equivalent) multiarch images that are hosted here https://cloud.docker.com/u/ibmcom/repository/docker/ibmcom.

For addon-resizer, I was using https://hub.docker.com/r/googlecontainer/addon-resizer-ppc64le/ when we got the above error. This is a ppc64le image.

Debugged today and found this in the logs

kubectl logs kube-state-metrics-595f76d67d-tj6g4 addon-resizer -n knative-monitoring I0111 10:00:25.584493 1 pod_nanny.go:63] Invoked by [/pod_nanny --container=kube-state-metrics --cpu=100m --extra-cpu=1m --memory=100Mi --extra-memory=2Mi --threshold=5 --deployment=kube-state-metrics] unknown flag: --threshold

In release.yaml, commented line:7470 --threshold=5 and the deployment succeeded

kubectl get pods -n knative-monitoring NAME READY STATUS RESTARTS AGE grafana-7549795fd4-p2jnj 1/1 Running 0 3m kibana-logging-68d7697687-gdmmb 1/1 Running 0 3m kube-state-metrics-5c9d7d6499-ln9mq 4/4 Running 0 1m prometheus-system-0 1/1 Running 0 3m prometheus-system-1 1/1 Running 0 3m

So everything seems to be up and running ...

However, running into issues in testing the sample app, debugging those, currently.

seth-priya commented 5 years ago

Continue to debug the issues in deploying the same hello world app. It seems to apply correctly, however the service is not coming up

kubectl logs controller-66f94dbf98-s9jsx controller -n knative-serving

shows errors like these,

{"level":"warn","ts":"2019-01-14T09:29:58.742Z","logger":"controller.service-controller","caller":"service/service.go:148","msg":"Failed to update service status{error 25 0 services.serving.knative.dev \"helloworld-go\" is forbidden: User \"system:serviceaccount:knative-serving:controller\" cannot update services.serving.knative.dev/status in the namespace \"default\"}","knative.dev/controller":"service-controller","knative.dev/key":"default/helloworld-go"}

@jonjohnsonjr any pointers / feedback / inputs on this? looks like I am missing something in the configuration?

jonjohnsonjr commented 5 years ago

cc @tcnghia might have a better answer

I believe that's because the knative serving controller needs cluster-admin permissions to create resource in other namespaces?

It looks like you're missing this. I think this command would fix it:

kubectl create clusterrolebinding knative-serving-controller-admin --clusterrole=cluster-admin --serviceaccount=knative-serving:controller --namespace=knative-serving
seth-priya commented 5 years ago

thanks @jonjohnsonjr tried the above, however looks like this was already applied on the system (perhaps while applying release.yaml). Got the below error ... Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "knative-serving-controller-admin" already exists

Still debugging...

tcnghia commented 5 years ago

@seth-priya can you please check if the knative-serving-admin role has similar content to this https://github.com/knative/serving/blob/master/config/200-clusterrole.yaml

Also, the role binding exists, but does it grant knative-serving-controller-admin the knative-serving-admin role?

If you can share kubectl get clusterrolebindings.rbac.authorization.k8s.io knative-serving-controller-admin -n knative-serving -o yaml and kubectl get ClusterRole knative-serving-admin -o yaml that will be really useful.

junawaneshivani commented 5 years ago

Hi @tcnghia , I am working with @seth-priya on building knative on ppc64le.

As per your response, the knative-serving-admin role has similar content to this https://github.com/knative/serving/blob/master/config/200-clusterrole.yaml

Please find the output of kubectl get clusterrolebindings.rbac.authorization.k8s.io knative-serving-controller-admin -n knative-serving -o yaml as knative-serving-controller-admin.txt and kubectl get ClusterRole knative-serving-admin -o yaml as knative-serving-admin.txt.

junawaneshivani commented 5 years ago

@jonjohnsonjr and @tcnghia We moved on to release-0.3 and deployed only the knative-serving component. We could see all 4 pods running.

Tried deploying the sample app but we are facing RevisionFailed issue. Do you have any pointers?

kubectl get all -n default gives output as output.txt

kubectl describe service.serving.knative.dev/helloworld-go gives output as service.txt

kubectl describe revision.serving.knative.dev/helloworld-go-00001 gives ouput as revision.txt

jonjohnsonjr commented 5 years ago

That could be an issue with how we're resolving tags to digests...

If you add "gcr.io" to the configmap here: https://github.com/knative/serving/blob/06fae8be6da29137fd55b44557572566ef69f975/config/config-controller.yaml#L30

Does that fix things?

seth-priya commented 5 years ago

@jonjohnsonjr issue is that we are using locally built Power images, pushed to a local registry, so not sure if that will help? do we have any workaround for that?

jonjohnsonjr commented 5 years ago

Is your network configured to allow pulling from gcr.io/knative-samples/helloworld-go to work?

junawaneshivani commented 5 years ago

@jonjohnsonjr pulling image from gcr.io or docker.io was failing so we build the image locally and tagged it with ko.local/junawaneshivani/helloworld-go. This helped resolve the fetch image issue and helloworld sample app seems to work fine now.

Do we need to add gcr.io and docker.io in the config-controller.yaml file to be able to pull images from them? docker pull works for gcr.io and docker.io but seems to fail only in sample app.

Also, we have moved to building knative eventing and sources, and facing similar ImagePull issues with eventing code samples. Debugging further.

junawaneshivani commented 5 years ago

@jonjohnsonjr I was able to run the knative eventing sample, by trying to pull the image by sha rather than the tag. Now we have pods under all 5 namespaces running using locally built ppc64le images.

seth-priya commented 5 years ago

Hi @jonjohnsonjr now that we are able to complete the deployment and at least basic validation of all the components on Power, was wondering if you or someone from the community would be able to help in pushing multi-arch docker images for knative and its sample apps to gcr.io, that would work on Power?

Please let me know your thoughts and suggestions on how best to take this forward.

Thank you for all your support and help thus far!!

seth-priya commented 5 years ago

This works on ICP 3.1.1 on Power as per https://github.com/knative/docs/blob/master/install/Knative-with-ICP.md, only significant changes required were due to use of locally built images for knative components ..

@jonjohnsonjr any thoughts / feedback on the earlier comment?

jonjohnsonjr commented 5 years ago

This is on my plate but I haven't found time to get to it yet. If anyone is interested in helping, I made an issue here: https://github.com/google/go-containerregistry/issues/333

I want this to happen automatically in ko if you set the base image to a manifest list. That requires adding manifest list support to google/go-containerregistry and then consuming that support for ko.

Sorry for not responding sooner; things have been a bit busy 😅

clyang82 commented 5 years ago

This works on ICP 3.1.1 on Power as per https://github.com/knative/docs/blob/master/install/Knative-with-ICP.md, only significant changes required were due to use of locally built images for knative components ..

@seth-priya does the images support multi-arch?

seth-priya commented 5 years ago

@clyang82 - no not at this point, as indicated by @jonjohnsonjr this would need work on the ko side. We have based the deployment on ppc64le specific images as of now.

clyang82 commented 5 years ago

@seth-priya Thanks for your answer. That means @jonjohnsonjr is working on that. right?

jonjohnsonjr commented 5 years ago

I'm slowly working on this as part of some other work. We need to add support for pulling and pushing manifest lists to go-containerregistry for a variety of reasons, and once that lands it'll be ~easy to update ko to support building and publishing multi-platform images.

junawaneshivani commented 5 years ago

@jonjohnsonjr I see that you are doing some work to support multiarch images here. We have some bandwidth available and can contribute in this aspect. I would like to know how much more work is pending and when can we expect power/multi-arch support in ko and subsequently in knative? Let me know if I can help in anything.

jonjohnsonjr commented 5 years ago

@junawaneshivani I'd like to refactor that implementation to what I described in the PR. I filed an issue upstream to make that easier: https://github.com/google/go-containerregistry/issues/474 -- I'll need to fix that before landing the change in ko. If somebody could build and test that PR against power, it would validate the approach.

After refactoring and merging https://github.com/google/ko/pull/38, we'll need a multi-platform base image to use for releases. Right now we're using gcr.io/distroless/static as the base, which is amd64/linux specific (at least in the config file[1]). We could turn that into a manifest list with an entry for each platform we care about. We could try to contribute that upstream to distroless, or maintain our own image and push that somewhere.

Once we have an appropriate base image, we would just need to update the release script to point to a separate release config via KO_CONFIG_PATH.

We currently only test on amd64/linux. I'm not sure if/how we'd test other architectures and operation systems.

[1]:

$ crane manifest gcr.io/distroless/static | jq .
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 458,
    "digest": "sha256:a574914f27fd415df3951c7bba405640659ec59bbd1fa56adc08f70dd51c585d"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 654432,
      "digest": "sha256:1558143043601a425aa864511da238799b57fcf7d062d47044f6ddd0e04fe99a"
    }
  ]
}

$ crane config gcr.io/distroless/static | jq .
{
  "architecture": "amd64",
  "author": "Bazel",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
    ]
  },
  "created": "1970-01-01T00:00:00Z",
  "history": [
    {
      "author": "Bazel",
      "created": "1970-01-01T00:00:00Z",
      "created_by": "bazel build ..."
    }
  ],
  "os": "linux",
  "rootfs": {
    "diff_ids": [
      "sha256:01092e5921c5543a918d54d9df752ee09a84c912a1d914b7eb37e7152f20b951"
    ],
    "type": "layers"
  }
}
junawaneshivani commented 5 years ago

Hi @jonjohnsonjr , will work on validating the PR against power. Thank you for your efforts in providing multi-arch support. :smile:

junawaneshivani commented 5 years ago

Hi @jonjohnsonjr , one of my colleague Lysanne, has successfully validated your PR changes against power for ko. Are there any other outstanding issues in adding POWER suppot for knative, other than the PR getting merged?