kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.55k stars 1.6k forks source link

build and publish `ml_metadata_store_server` container image for ARM64 #10308

Open thesuperzapper opened 9 months ago

thesuperzapper commented 9 months ago

Description

Right now, the gcr.io/tfx-oss-public/ml_metadata_store_server container image is the only image used in Kubeflow which is not published for both amd64 AND arm64. This means that Kubeflow 1.8 still can not properly run on ARM clusters.

I have made a PR upstream in google/ml-metadata to get the builds working for ARM64:

We need to work with the ml-metadata team to review/merge it and then set up a process to also push the ARM version of that image to GCR.

EDIT: I was incorrect about this being the "only one" but I think this must be the only one that does not work at all under Rosetta emulation (but either way, we need to fix this one too as we also push native arm images for the others). I have raised a separate issue to track fixing the other images:


Love this idea? Give it a 👍.

thesuperzapper commented 9 months ago

/cc @chensun @zijianjoy

thesuperzapper commented 9 months ago

For those who want to test, I have made a forked repo in the deployKF org with the ARM versions of the gcr.io/tfx-oss-public/ml_metadata_store_server image. You can test a patched version of ml-metdata version 1.14.0 by using the following container:

Note, building under emulation on GitHub actions took about 5 hours:

xixici commented 9 months ago

Great. I pull this image and run it correctly. Then, I am finding gcr.io/ml-pipeline/metadata-writer and gcr.io/ml-pipeline/metadata-envoy with ARM version.

thesuperzapper commented 9 months ago

@xixici can you confirm what you are saying?

Because gcr.io/ml-pipeline/metadata-writer:2.0.5 and gcr.io/ml-pipeline/metadata-envoy:2.0.5 (and all other versions) are only published for ADM64.

I assume you mean that they work via Rosetta Emulation on a MacBook?

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

thesuperzapper commented 5 months ago

/reopen

google-oss-prow[bot] commented 5 months ago

@thesuperzapper: Reopened this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/10308#issuecomment-2040613995): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
thesuperzapper commented 5 months ago

Clearly, this is going to take some time, so I will prevent the bot from closing it.

/lifecycle frozen

MouseSun846 commented 1 month ago

I have successfully completed metadata envoy 2.0.5 and built it in an ARM environment

The following are the construction steps:

docker pull --platform linux/arm64 envoyproxy/envoy:v1.16.0

In the directory of pipelines/third_party/metadata_envoy

1、modify Dockerfile and config proxy info

FROM envoyproxy/envoy:v1.16.0

RUN apt-get -o Acquire::http::proxy="http://proxy:port" update -y && \
  apt-get -o Acquire::http::proxy="http://proxy:port" install --no-install-recommends -y -q gettext openssl

COPY third_party/metadata_envoy/envoy.yaml /etc/envoy.yaml

# Copy license files.
#RUN mkdir -p /third_party
COPY third_party/metadata_envoy/license.txt /third_party/license.txt

ENTRYPOINT ["/usr/local/bin/envoy", "-c"]
CMD ["/etc/envoy.yaml"]

2、modify envoy.yaml

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 9090 }
      filter_chains:
        - filters:
            - name: envoy.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
                codec_type: auto
                stat_prefix: ingress_http
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: metadata-cluster
                            max_grpc_timeout: 0s
                      cors:
                        allow_origin_string_match:
                          - exact: "*"
                        allow_methods: GET, PUT, DELETE, POST, OPTIONS
                        allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                        max_age: "1728000"
                        expose_headers: custom-header-1,grpc-status,grpc-message
                http_filters:
                  - name: envoy.grpc_web
                  - name: envoy.cors
                  - name: envoy.router
  clusters:
    - name: metadata-cluster
      connect_timeout: 30.0s
      type: logical_dns
      http2_protocol_options: {}
      lb_policy: round_robin
      load_assignment:
        cluster_name: metadata-cluster
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: metadata-grpc-service
                  port_value: 8080   

3、finally,build image In the directory of pipelines/backend/Makefile

.PHONY: metadata_envoy
metadata_envoy:
    cd $(MOD_ROOT) && docker build -t registry.cnbita.com:5000/kubeflow-pipelines/metadata-envoy:2.0.5-arm  -f third_party/metadata_envoy/Dockerfile .

run make metadata_envoy

MouseSun846 commented 1 month ago

Clearly, this is going to take some time, so I will prevent the bot from closing it.

/lifecycle frozen

https://github.com/kubeflow/pipelines/issues/10308#issuecomment-2235890991

MouseSun846 commented 1 month ago

It seems that kubeflow v2 does not use the metadata writer component. My cluster has not installed it, but I can still use the pipeline normally

image