Open thesuperzapper opened 9 months ago
/cc @chensun @zijianjoy
For those who want to test, I have made a forked repo in the deployKF org with the ARM versions of the gcr.io/tfx-oss-public/ml_metadata_store_server
image. You can test a patched version of ml-metdata
version 1.14.0
by using the following container:
ghcr.io/deploykf/ci/ml_metadata_store_server:sha-cad0c56
ghcr.io/deploykf/ml_metadata_store_server:1.14.0-deploykf.0
Note, building under emulation on GitHub actions took about 5 hours:
Great. I pull this image and run it correctly. Then, I am finding gcr.io/ml-pipeline/metadata-writer
and gcr.io/ml-pipeline/metadata-envoy
with ARM version.
@xixici can you confirm what you are saying?
Because gcr.io/ml-pipeline/metadata-writer:2.0.5
and gcr.io/ml-pipeline/metadata-envoy:2.0.5
(and all other versions) are only published for ADM64.
I assume you mean that they work via Rosetta Emulation on a MacBook?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen
@thesuperzapper: Reopened this issue.
Clearly, this is going to take some time, so I will prevent the bot from closing it.
/lifecycle frozen
docker pull --platform linux/arm64 envoyproxy/envoy:v1.16.0
In the directory of pipelines/third_party/metadata_envoy
1、modify Dockerfile and config proxy info
FROM envoyproxy/envoy:v1.16.0
RUN apt-get -o Acquire::http::proxy="http://proxy:port" update -y && \
apt-get -o Acquire::http::proxy="http://proxy:port" install --no-install-recommends -y -q gettext openssl
COPY third_party/metadata_envoy/envoy.yaml /etc/envoy.yaml
# Copy license files.
#RUN mkdir -p /third_party
COPY third_party/metadata_envoy/license.txt /third_party/license.txt
ENTRYPOINT ["/usr/local/bin/envoy", "-c"]
CMD ["/etc/envoy.yaml"]
2、modify envoy.yaml
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 9090 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: metadata-cluster
max_grpc_timeout: 0s
cors:
allow_origin_string_match:
- exact: "*"
allow_methods: GET, PUT, DELETE, POST, OPTIONS
allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
max_age: "1728000"
expose_headers: custom-header-1,grpc-status,grpc-message
http_filters:
- name: envoy.grpc_web
- name: envoy.cors
- name: envoy.router
clusters:
- name: metadata-cluster
connect_timeout: 30.0s
type: logical_dns
http2_protocol_options: {}
lb_policy: round_robin
load_assignment:
cluster_name: metadata-cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: metadata-grpc-service
port_value: 8080
3、finally,build image In the directory of pipelines/backend/Makefile
.PHONY: metadata_envoy
metadata_envoy:
cd $(MOD_ROOT) && docker build -t registry.cnbita.com:5000/kubeflow-pipelines/metadata-envoy:2.0.5-arm -f third_party/metadata_envoy/Dockerfile .
run make metadata_envoy
Clearly, this is going to take some time, so I will prevent the bot from closing it.
/lifecycle frozen
https://github.com/kubeflow/pipelines/issues/10308#issuecomment-2235890991
It seems that kubeflow v2 does not use the metadata writer component. My cluster has not installed it, but I can still use the pipeline normally
Description
Right now, the
gcr.io/tfx-oss-public/ml_metadata_store_server
container image is theonly image used in Kubeflow which is not published for both. This means that Kubeflow 1.8 still can not properly run on ARM clusters.amd64
ANDarm64
I have made a PR upstream in
google/ml-metadata
to get the builds working for ARM64:We need to work with the
ml-metadata
team to review/merge it and then set up a process to also push the ARM version of that image to GCR.EDIT: I was incorrect about this being the "only one" but I think this must be the only one that does not work at all under Rosetta emulation (but either way, we need to fix this one too as we also push native arm images for the others). I have raised a separate issue to track fixing the other images:
Love this idea? Give it a 👍.