Open larrycai opened 2 years ago
Macbook Air M1, and I use minikube
to have docker env (inside qemu), but meet limux/amd64 issue for pytorch
pytorch/pytorch
$ brew install minikube
$ minikube start --driver qemu
$ minikube ssh
# enter into qemu to have docker env
$ ./build-images-locally.sh
Step 1/10 : FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
---> 3850639cdf7a
Step 2/10 : RUN pip3 install minio protobuf~=3.20.0 grpcio torch-model-archiver
---> [Warning] The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
---> Running in e639c32ba232
exec /bin/sh: exec format error
pytorch/torchserve
same as before, pytorch/torchserve
doesn't have arm64 container image
config/torch_server_config.properties
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/home/docker/MiniAutoML/scripts/config/torch_server_config.properties" to rootfs at "/home/model-server/config.properties": mount /home/docker/MiniAutoML/scripts/config/torch_server_config.properties:/home/model-server/config.properties (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
Now I made the ./lab-001-start-all.sh
works by several patches (not completed for whole exercises)
kumatea/pytorch:1.9.0
arm64 based container image instead (not formal)FROM pytorch:1.9.0-cuda10.2-cudnn7-runtime
=>
FROM kumatea/pytorch:1.9.0
pytorch/torchserve:latest-cpu
Download https://github.com/pytorch/serve and change docker/Dockerfile
torch==$TORCH_VER+cpu
=>
torch==$TORCH_VER
after build, it will generate pytorch/torchserve:latest-cpu
, it can replace pytorch/torchserve:0.5.3-cpu
$(pwd)/config/torch_server_config.properties
Not sure whether it is related with other patch, without this, it is created as folder
It may be generated dedicated arm based container image for this lab for those patches (Patch 1 is not maintained by original author)
Let's see how to make lab script can be run in container env
Make packages inside container, see Dockerfile
FROM continuumio/miniconda3:4.12.0
RUN apt-get update && apt-get install -y curl jq && apt-get clean
# conda & python3 & pip are ok
# minio mc
RUN curl -O https://dl.min.io/client/mc/release/linux-arm64/mc --output-dir /usr/local/bin && \
chmod +x /usr/local/bin/mc
# grpcurl
RUN curl -L -O https://github.com/fullstorydev/grpcurl/releases/download/v1.8.7/grpcurl_1.8.7_linux_arm64.tar.gz && \
tar -xvzf grpcurl_1.8.7_linux_arm64.tar.gz && chmod +x grpcurl && \
mv grpcurl /usr/local/bin/grpcurl
simply to build as image docker build -t orca3-lab .
Now we can run it sharing host network
$ docker run -it -v $(pwd):/lab -w /lab --network=host orca3-lab
(base) root@minikube:/lab# ./lab-002-upload-data.sh
..
Creating intent dataset
{
"dataset_id": "1",
"name": "tweet_emotion",
"dataset_type": "TEXT_INTENT",
"last_updated_at": "2022-10-02T06:04:48.411844Z",
"commits": [
{
"dataset_id": "1",
"commit_id": "1",
"created_at": "2022-10-02T06:04:49.262442Z",
"commit_message": "Initial commit",
"path": "dataset/1/commit/1",
"statistics": {
"numExamples": "2963",
"numLabels": "3"
}
}
]
}
at the same env
(base) root@minikube:/lab# ./lab-003-first-training.sh 1
dataset_id is 1
version_hash is "hashAg=="
job_id is 4
job 4 is currently in unknown status, check back in 5 seconds
job 4 is currently in "launch" status, check back in 5 seconds
ERROR:
Code: NotFound
Message: Run 4 doesn't exist
ERROR:
Code: NotFound
Message: Cannot locate model artifact for runId 4.
check log
$ docker logs training-service
..
06:25:00.416 [grpc-default-executor-8] INFO org.orca3.miniAutoML.ServiceBase - Method: training.TrainingService/GetTrainingStatus, Response: status: failure
job_id: 5
message: "Exit code 1"
metadata {
algorithm: "intent-classification"
dataset_id: "1"
name: "test1"
train_data_version_hash: "hashAg=="
parameters {
key: "BATCH_SIZE"
value: "64"
}
parameters {
key: "EPOCHS"
value: "15"
}
parameters {
key: "FC_SIZE"
value: "128"
}
parameters {
key: "LR"
value: "4"
}
output_model_name: "twitter-model"
}
and another log (runId could be different since it is copied in different time)
$ docker logs prediction-service
07:39:49.225 [grpc-default-executor-14] INFO org.orca3.miniAutoML.ServiceBase - Method: prediction.PredictionService/Predict, Message: runId: "11"
document: "You can have a certain #arrogance, and I think that\'s fine, but what you should never lose is the #respect for the others."
07:39:49.230 [grpc-default-executor-14] ERROR org.orca3.miniAutoML.prediction.PredictionService - Cannot locate model artifact for runId 11.
io.grpc.StatusRuntimeException: NOT_FOUND: Artifact with runId 11 doesn't exist
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at org.orca3.miniAutoML.metadataStore.MetadataStoreServiceGrpc$MetadataStoreServiceBlockingStub.getArtifact(MetadataStoreServiceGrpc.java:456)
at org.orca3.miniAutoML.prediction.PredictionService.predict(PredictionService.java:62)
at org.orca3.miniAutoML.prediction.PredictionServiceGrpc$MethodHandlers.invoke(PredictionServiceGrpc.java:204)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:797)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
checked the minio
(base) root@minikube:/lab# source env-vars.sh
(base) root@minikube:/lab# mc alias -q set myminio http://127.0.0.1:"${MINIO_PORT}" "${MINIO_ROOT_USER}" "${MINIO_ROOT_PASSWORD}"
(base) root@minikube:/lab# mc find myminio/mini-automl-dm
myminio/mini-automl-dm/dataset/1/commit/1/examples.csv
myminio/mini-automl-dm/dataset/1/commit/1/labels.csv
myminio/mini-automl-dm/upload/tweet_emotion_part1.csv
myminio/mini-automl-dm/upload/tweet_emotion_part2.csv
myminio/mini-automl-dm/versionedDatasets/1/hashAg==/examples.csv
myminio/mini-automl-dm/versionedDatasets/1/hashAg==/labels.csv
Thank you for your suggestions! Were you able to run the lab using images from Docker Hub?
Don't know u have docker image provided directly, it was not stated in the book when I reviewed.
From tag point of view, there is no ARM64 exists, so it mostly will not work for Mac M1/M2
Did you run into any text (either in the book, or in README) that mention running scripts/build-images-locally.sh
? Wanted to make sure we are pointing our readers to try the stock images before trying to build locally.
I tried the full lab on my Apple M1. All containers worked fine under QEMU. It would be great to see if you would share the same success.
Hopefully all dependent container images will soon have their respective official arm64 versions.
even using qemu, it is still ARM64 based, so if the base images has no arm64 version, I can't build it correctly.
If I remember correctly, I followed the guideline either book or README here and recorded all my findings.
if u have recent changes in the script, I can check again.
// I use minikube to have docker env, normally I use podman for container env. but both uses qemu
Agree that scripts/build-images-locally.sh
will not finish properly right now on Apple M1 hardware. We are hoping that arm64 base images become available soon so we don't need to maintain two separate ways to build images when needed to.
It looks like both the book and README
starts with scripts/lab-001-start-all.sh
. That's why I am wondering if there's anywhere in the material that suggested building images locally was necessary.
It would be great if you can try the lab again starting with scripts/lab-001-start-all.sh
from a clean environment (clear any existing locally built Docker images from the lab so that it will pull stock images clean from Docker Hub), and see if that works.
Greatly appreciate your effort in trying to build these containers locally on Apple M1! Let me know if you are interested in helping to keep track of the release of arm64 base images.
No, it doesn't work
$ brew install minikube
$ minikube start --driver qemu # I delete/start with clean env
$ minikube ssh
# enter into qemu to have docker env
(minikube) $ git clone https://github.com/orca3/MiniAutoML.git
(minikube) $ cd MiniAutoML
(minikube) $ scripts/lab-001-start-all.sh
Created docker network orca3
Unable to find image 'minio/minio:latest' locally
latest: Pulling from minio/minio
..
Status: Downloaded newer image for minio/minio:latest
b0bb0b8bf7577b958f69d2ee8eda8445418ff55fa9fe133cfd04f74f606e6f0d
Started minio docker container and listen on port 9000
..
Unable to find image 'orca3/services:latest' locally
latest: Pulling from orca3/services
...
Digest: sha256:4a70f0992171b55278ea58254df4c67a8674e27f04fae8b6a6fc6b5b45936659
Status: Downloaded newer image for orca3/services:latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
f190824fc5cd89e9790a8001636467305d98c668bae13573bfe6930de16fa359
Started data-management docker container and listen on port 6000
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
fb89a4d8cbd52cd3857037ae5b2838930a68086d23ebe6c2d9f120799b657502
Started metadata-store docker container and listen on port 6002
rm: cannot remove 'model_cache': No such file or directory
Unable to find image 'orca3/intent-classification-predictor:latest' locally
latest: Pulling from orca3/intent-classification-predictor
...
Digest: sha256:af2df5fd32e9488888c9e6d9a16f8a3e0f510436b94b41989f1d186e493929ad
Status: Downloaded newer image for orca3/intent-classification-predictor:latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
4414363ba68cf3109a20a7c671151594303e537663da3c78c54ae8be2d59ebb2
Started intent-classification-predictor docker container and listen on port 6101
Started intent-classification-predictor docker container and listen on port 6101
Unable to find image 'pytorch/torchserve:0.5.2-cpu' locally
0.5.2-cpu: Pulling from pytorch/torchserve
284055322776: Pull complete
bf7640766b3b: Pull complete
d05665a60e73: Pull complete
85824628b9b8: Pull complete
d93240f6b9fe: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:52ce3f86274bc92aec7a73702358323724097a75cea6d60ac39cd5f445bf727e
Status: Downloaded newer image for pytorch/torchserve:0.5.2-cpu
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
3acb0394a243b672bd83e5fef63c6ce6dc8e736e3bd958e62248cc5df8ca03de
Started intent-classification-torch-predictor docker container and listen on port 6102 & 6103
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
ce2aeced028e784985e7a4df89298d858e611e9c4f8187e3a6455ce1f6667a4d
Started prediction-service docker container and listen on port 6001
latest: Pulling from orca3/intent-classification
...
Digest: sha256:386920bcf1bb81b82c37fe223547a46346fa890893c060f99c2091b971e9d1a3
Status: Downloaded newer image for orca3/intent-classification:latest
docker.io/orca3/intent-classification:latest
pull intent-classification training image
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
c38b84a7784ea0efdfb8e7cec05d520637db4b27c3a628809ca748e9e3e2fae1
Started training-service docker container and listen on port 6003
...
So
minio/minio
is okorca3/services
, orca3/intent-classification-predictor
and orca3/intent-classification
are not correct, https://hub.docker.com/u/orca3 doesn't have ARM64 images (surely it will not work)pytorch/torchserve:0.5.2-cpu
is not correct as well due to no ARM64 imageYou need to find a way to build arm docker image
My thinking
When u have new updates, I can always help to verify, I like your book and this lab materials
Thanks for running through step 1. Can you try the subsequent steps as well? I also run on Apple M1 and was able to finish all steps in the lab using just Docker Desktop with linux/amd64
base images. It would be great if you can try running the rest of the scripts, just to make sure the lab works on Apple M1 with emulation.
We can tackle building a set of images for linux/arm64
as a separate effort.
Thanks for liking the book and the lab! :) This is the best thing authors would like to hear. Feel free to provide any other feedback or suggestions.
Sorry, I don't have docker desktop env. I guess it shall work for emulation env (amd64). I will wait and test ARM64 when it is available.
What version of Docker are you using? It probably doesn't require Docker Desktop. Here's the one I'm using.
❯ docker version
Client:
Cloud integration: v1.0.29
Version: 20.10.20
API version: 1.41
Go version: go1.18.7
Git commit: 9fdeb9c
Built: Tue Oct 18 18:20:35 2022
OS/Arch: darwin/arm64
Context: desktop-linux
Experimental: true
Server: Docker Desktop 4.13.0 (89412)
Engine:
Version: 20.10.20
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 03df974
Built: Tue Oct 18 18:18:16 2022
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Lab env
Most of Macbook is M1 based now, so some container image could be tricky (x86 based) like
pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
Also
podman
instead ofdocker