Closed leoyifeng closed 1 year ago
Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
@leoyifeng assuming today we usually do make docker-images publish-docker-images
how does this pr change this user experience? With this, we would need to do MULTIARCH=true make docker-images
assuming the multi-arch pushes directly to the docker repository.
@leoyifeng assuming today we usually do
make docker-images publish-docker-images
how does this pr change this user experience? With this, we would need to doMULTIARCH=true make docker-images
assuming the multi-arch pushes directly to the docker repository.
yes, with my current implementation, user has to specify the MULTIARCH parameter to trigger the buildx build. Do we want to build multi arch images by default going forward ? and possibly a new command that enable user to build local image to use locally ?
I like this - thank you @leoyifeng!
A couple of quick questions:
I'm not familiar with multi-arch builds but in the examples I see, there is a comma-separated list of platforms specified via --platform
. Are all platforms built if that option is not specified?
I'd rather not push the images until they are validated and this code would (unconditionally) attempt to push :dev
-tagged images. Is the --push
required for this?
Do we want to build multi-arch images by default going forward?
If they are indeed this painless, then I'm thinking yes. My understanding is that these are all in the same image and the Docker Engine determines which arch to use via the built-in manifest in conjunction with the runtime architecture. Do these images tend to be significantly larger than "single-arch" images? If not, I'd be cool with enabling this by default.
I guess based on what @kevin-bates mentioned about the ability to test, this way looks good then.
So, just validating the user experience (in this case someone that is building the images or a release)
@kevin-bates as for your question, as far as I know, the only way to publish multi-arch is to push it directly to the docker repository.
So, just validating the user experience (in this case someone that is building the images or a release)
- Build regularly so it can have the images locally and test
- When validated, he has to build it again with 'MULTIARCH=true'?
Are you saying that I cannot locally validate a multi-arch image w/o pushing to some repository first?
@kevin-bates as for your question, as far as I know, the only way to publish multi-arch is to push it directly to the docker repository.
I thought the act of pushing the image was the same as "publishing".
That said, if I remove the --push
from the buildx
statement, the image build completes, but docker images
does not display the image. However, if I leave things as-is (with --push
present) my build times out at that step due to authorization issues. Even though I can push to dockerhub
from the CLI, what else is necessary to have the --push
step work?
@lresende - have you tried running a build of applicable images with this PR?
I like this - thank you @leoyifeng!
A couple of quick questions:
- I'm not familiar with multi-arch builds but in the examples I see, there is a comma-separated list of platforms specified via
--platform
. Are all platforms built if that option is not specified?- I'd rather not push the images until they are validated and this code would (unconditionally) attempt to push
:dev
-tagged images. Is the--push
required for this?Do we want to build multi-arch images by default going forward?
If they are indeed this painless, then I'm thinking yes. My understanding is that these are all in the same image and the Docker Engine determines which arch to use via the built-in manifest in conjunction with the runtime architecture. Do these images tend to be significantly larger than "single-arch" images? If not, I'd be cool with enabling this by default.
for the --platform, good point, I think I miss the option for my PR, you need to specify which option to build with, otherwise my understanding is it's going to build with the platform for the machine. The other part is the base image need to support the platform. I'm testing with linux/amd64,linux/arm64
Do these images tend to be significantly larger than "single-arch" images?
@kevin-bates yes, it will be larger since it actually contains multiple singe image for different platforms
yeah, I think I better understand this a bit. It seems that buildx
is more of a context. Since I don't see the images, I ran docker buildx ls
to list the current builds and got this...
$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS
beautiful_khayyam docker-container
beautiful_khayyam0 rancher-desktop running linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/386
distracted_haibt docker-container
distracted_haibt0 rancher-desktop running linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/386
epic_antonelli docker-container
epic_antonelli0 rancher-desktop running linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/386
hungry_poincare docker-container
hungry_poincare0 rancher-desktop running linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/386
interesting_feynman * docker-container
interesting_feynman0 rancher-desktop running linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/386
As a result, I'm not sure --platform
is necessary unless these do add significant bloat to the image. Some of these were previous builds terminated with ctrl-c (like when the auth stuff was timing out). This seems like it could be problematic. Is it necessary to stop
or rm
the builder instance after each image is built?
Is it the act of "pushing" the image that closes the buildx instance? (I'm guess this is what @lresende may have been getting at).
I think we are all saying the same, with one caveat. My understanding is that you can only run the docker image locally that matches your platform, so in this case, if you have a mac with M1 processor, you can't actually build and test an x64 image. But I will let @leoyifeng confirm.
My understanding is that you can only run the docker image locally that matches your platform, so in this case, if you have a mac with M1 processor, you can't actually build and test an x64 image.
The point of the multi-arch builds is to combine the layers of multiple architectures in the same image. So building and testing are two different things. We should be able to build x64 images on an M1 platform, but, yes, you can't run an x64 image on an M1 platform.
I (now) have multiple concerns:
docker buildx rm <context>
). I think we will need that as part of this PR.Since it appears that --push
IS necessary (I suppose so that the context goes away), I think this kind of target needs to be reserved for actual release builds. What is concerning about this is whether or not we can trust that a single-arch build (that is validated) behaves identically to that same platform in a multi-arch build.
I'm not sure if I'm not understanding how this all works, but given that I've already blown out my docker build environment due to resource exhaustion, I'm hoping there are steps that can be taken to make this more natural for developers.
@kevin-bates
- Using the previous platform example, how would one test the M1 portion of a multi-arch image on an M1 platform WITHOUT having to first PUSH that image to some repository?
if you don't set the MULTIARCH_BUILD, make docker-images will build the image for the platform that machine support, and can verify locally
- I have accumulated 9 buildx contexts in the course of trying things out. The builds now fail due to resource (disk) issues, so there's nothing in place to recover from failures and remove the context (via
docker buildx rm <context>
). I think we will need that as part of this PR.
yes, this is a valid concern, I think we should remove the buildx instance once the build finished. I'll update the PR to include that.
I'm not sure --platform is necessary unless these do add significant bloat to the image.
I think --platform is necessary, otherwise, buildx will just build the image that works for the build machine.
Found this interesting regarding local images: https://github.com/docker/buildx/issues/166
I think having the manifest that is produced when multiple platforms are specified is a good thing. I am wondering, however, if the "default build" should take a single platform name defaulting to the more common arch (no idea what that is) but allowing others to build a different arch as their "default build". This should produce a local image of that architecture. I don't think its necessary to do this in this PR, but perhaps something to keep in mind.
FWIW - I've pushed multi-arch images for :dev
to the elyra org in docker hub. If anyone uses arm64
, it would be great if you could check them out. The helm charts in the main
branch should reference :dev
(as should those in this PR's branch). I'll checkout amd64
tomorrow, then will build "default local" images (single-arch) and check those out.
thx @kevin-bates pr updated. but just trying to understand why remove RUN mkdir -p /usr/lib/jvm/java
would fix the issue ?
why remove RUN mkdir -p /usr/lib/jvm/java would fix the issue ?
Per my previous image captures, when /usr/lib/jvm/java
exists it messes up the linking that happens in the next step, resulting the link created in /usr/lib/jvm/java
rather than java
being the link....
jovyan@c7778786e3ee:/usr/lib/jvm$ ls java
java-8-openjdk-amd64
jovyan@c7778786e3ee:/usr/lib/jvm$ ls java/java-8-openjdk-amd64
ASSEMBLY_EXCEPTION bin docs include jre lib man src.zip THIRD_PARTY_README
You can check this yourself by looking at the multiarch image:
$ docker run elyra/kernel-spark-py:dev ls -l /usr/lib/jvm/java
...
+ exec /usr/bin/tini -g -- ls -l /usr/lib/jvm/java
Non-spark-on-k8s command provided, proceeding in pass-through mode...
lrwxrwxrwx 1 root root 33 Nov 11 18:38 /usr/lib/jvm/java -> /usr/lib/jvm/java-8-openjdk-amd64
Then, looking inside the directory (i.e., JAVA_HOME), you actually see the expected contents, not the other directory as before...
$ docker run elyra/kernel-spark-py:dev ls -l /usr/lib/jvm/java/
...
+ exec /usr/bin/tini -g -- ls -l /usr/lib/jvm/java/
total 20
lrwxrwxrwx 1 root root 22 Oct 25 17:26 ASSEMBLY_EXCEPTION -> jre/ASSEMBLY_EXCEPTION
drwxr-xr-x 2 root root 4096 Nov 11 18:38 bin
lrwxrwxrwx 1 root root 41 Oct 25 17:26 docs -> ../../../share/doc/openjdk-8-jre-headless
drwxr-xr-x 3 root root 4096 Nov 11 18:38 include
drwxr-xr-x 5 root root 4096 Nov 11 18:38 jre
drwxr-xr-x 3 root root 4096 Nov 11 18:38 lib
drwxr-xr-x 4 root root 4096 Nov 11 18:38 man
lrwxrwxrwx 1 root root 20 Oct 25 17:26 src.zip -> ../openjdk-8/src.zip
lrwxrwxrwx 1 root root 22 Oct 25 17:26 THIRD_PARTY_README -> jre/THIRD_PARTY_README
Thanks @leoyifeng.
We should talk about one final item I can see...
Since we build the images during release.sh
with the --dryRun
(which is fine IMO) should we update the release.sh
script to add the MULTIARCH_BUILD=y
flag in both cases? (I guess we really don't need the push-images
tag since --dryRun
will push by virtue of MULTIARCH_BUILD
, but I'm okay leaving it for now.)
@lresende - what do you think? I think this needs to happen either now or before we build 3.0.1 so might as well add it now.
I went ahead and pushed a commit to update the release script.
@lresende - could you please take a look?
Congrats on your first merged pull request in this project! :tada:
Thank you for contributing, we are very proud of you! :heart:
Just FYI: For building the multiarch builds, it seems that resources on your docker desktop have direct influence on your success and speed of the build. Regular resources like 2 or 3 cores with 4 GB ram make the build takes forever. Increasing that to 7 cores, 20 gb and 4 GB of swap made demo + demo-base
and eg + kernels
to be built in parallel and published much quicker...
BTW, I built + publish 3.1.0 in this experience
@kevin-bates @leoyifeng