intro-stat-learning / ISLP_labs

Up-to-date version of labs for ISLP
BSD 2-Clause "Simplified" License
660 stars 388 forks source link

Construction and Deployment of a Docker image #5

Closed tschm closed 1 year ago

tschm commented 1 year ago

You need to control how can introduce tags. You need your own Dockerhub account...

jonathan-taylor commented 1 year ago

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

jonathan-taylor commented 1 year ago

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

jonathan-taylor commented 1 year ago

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt
tschm commented 1 year ago

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

Yes, docker images a very much an ubuntu thing. That's a huge advantage as you can use them on Windows, Mac or Ubuntu. I am using a Mac with M1, too

tschm commented 1 year ago

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

Yes, loading the image the first time, is a huge operation if you don't have the scicy-notebook layers in cache... I think the scipy-notebook is very helpful though... Has conda, pip, non-root user, ...

jonathan-taylor commented 1 year ago

Got it, you made a tag v0.0.1​... https://github.com/tschm/ISLP_labs/releases/tag/v0.0.1

Was just not sure where v0.0.1​ came from since the intro-stat-learning​ repo doesn't have that tag.

Could also change it to a manual dispatch or something else I suppose.


From: Thomas Schmelzer @.> Sent: Sunday, August 20, 2023 11:25 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

@tschm commented on this pull request.


On .github/workflows/docker.ymlhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#discussion_r1299659911:

The docker image constructed in tagged. It is only executed when

on: release: types: [published]

Hence the tag is picked up and used to tag the image. If you just do a simple commit no new docker image is constructed.

At the same time the image :latest is updated.

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#discussion_r1299659911, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM22TTUTYZREEC6ZETVDXWL5NRANCNFSM6AAAAAA3XYYXVQ. You are receiving this because you commented.Message ID: @.***>

tschm commented 1 year ago

For the port, The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

tschm commented 1 year ago

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

jonathan-taylor commented 1 year ago

Overall, I think this can wait until we actually have several people who want an "official" docker image. By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab
jonathan-taylor commented 1 year ago

For the port, The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

Yep, docker --help pointed that out...

tschm commented 1 year ago

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would recommend to keep and document both options. The results of your pip install will not be invariant as you can't control dependencies of your dependencies. Also, some versions you point to may disappear. Once you bake them into an image they are there for eternity. You may not need this level of robustness though.

tschm commented 1 year ago

Overall, I think this can wait until we actually have several people who want an "official" docker image. By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab

I with the community standard would be to setup a virtual environment in the first place as you do. To me it seems people just pip install into their central Python env

jonathan-taylor commented 1 year ago

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1

So, on the whole, this is "effectively" capturing the docker image that binder builds.

It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption.


From: Jonathan E. Taylor @.> Sent: Sunday, August 20, 2023 11:29 PM To: intro-stat-learning/ISLP_labs @.>; intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

Got it, you made a tag v0.0.1​... https://github.com/tschm/ISLP_labs/releases/tag/v0.0.1

Was just not sure where v0.0.1​ came from since the intro-stat-learning​ repo doesn't have that tag.

Could also change it to a manual dispatch or something else I suppose.


From: Thomas Schmelzer @.> Sent: Sunday, August 20, 2023 11:25 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

@tschm commented on this pull request.


On .github/workflows/docker.ymlhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#discussion_r1299659911:

The docker image constructed in tagged. It is only executed when

on: release: types: [published]

Hence the tag is picked up and used to tag the image. If you just do a simple commit no new docker image is constructed.

At the same time the image :latest is updated.

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#discussion_r1299659911, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM22TTUTYZREEC6ZETVDXWL5NRANCNFSM6AAAAAA3XYYXVQ. You are receiving this because you commented.Message ID: @.***>

tschm commented 1 year ago

You can also create an even bigger image that has both R and Python installed. See jupyter-stack documentation

jonathan-taylor commented 1 year ago

But it seems a little heavy-handed to say the solution is to use docker​ instead of teaching them to manage a virtual environment....


From: Thomas Schmelzer @.> Sent: Sunday, August 20, 2023 11:43 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

Overall, I think this can wait until we actually have several people who want an "official" docker image. By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y conda activate my_islp_env pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab

I with the community standard would be to setup a virtual environment in the first place as you do. To me it seems people just pip install into their central Python env

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#issuecomment-1685742515, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM23KNWZ3FGC2D67NLQDXWL7RNANCNFSM6AAAAAA3XYYXVQ. You are receiving this because you commented.Message ID: @.***>

jonathan-taylor commented 1 year ago

I believe it. It's basically an ubuntu server. You can do tons.

Of course, linking to R introduces more dependency.


From: Thomas Schmelzer @.> Sent: Sunday, August 20, 2023 11:44 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

You can also create an even bigger image that has both R and Python installed. See jupyter-stack documentation

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#issuecomment-1685743835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM26WHYG6ZWTL4WYXAA3XWL7V5ANCNFSM6AAAAAA3XYYXVQ. You are receiving this because you commented.Message ID: @.***>

tschm commented 1 year ago

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption.

You need to fix the version of the spicy-notebook image. I think I am using something like 4.0.4. For binder, there are ways to build the image directly on binder infrastructure and keep it in their cache. Not an expert though... I think your image might be a bit too big for binder. Takes ages to construct it from your requirements

tschm commented 1 year ago

But it seems a little heavy-handed to say the solution is to use docker​ instead of teaching them to manage a virtual environment....

The virtual environment thing is not that easy. It exposes you to all sorts of OS dependency problems.

jonathan-taylor commented 1 year ago

Sigh. Binder is not something we "support". It's a service that people can try. It has limited resources, and has its way of managing them. And yes, a fresh build takes some time.

Docker images are cached on binder, and if you read the documentation, it indicates that repos that get a lot of traffic eventually have quicker startup times.

My comment was that we can think of making this docker image available is going to give users the same experience as launching binder, but it can be faster.


From: Thomas Schmelzer @.> Sent: Sunday, August 20, 2023 11:47 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5)

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption. …

You need to fix the version of the spicy-notebook image. I think I am using something like 4.0.4. For binder, there are ways to build the image directly on binder infrastructure and keep it in their cache. Not an expert though... I think your image might be a bit too big for binder. Takes ages to construct it from your requirements

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/pull/5#issuecomment-1685746362, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM25UIDXNNUZ66YOZUXTXWL76JANCNFSM6AAAAAA3XYYXVQ. You are receiving this because you commented.Message ID: @.***>

tschm commented 1 year ago

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption.

I think the order is wrong :-) You should build the image and binder should capture it :-) Binder is somewhat tricky about being pointed to docker images.

tschm commented 1 year ago

I have updated the underlying image, see https://hub.docker.com/r/tschm/islp_labs/tags. The resulting image is now smaller but still close to 3 GB... let's check the files copied into the image

tschm commented 1 year ago

I have tried to address the somewhat large size of the resulting images. However, it seems that's a direct consequence of installing the NVidia packages. I did an analysis with SLIM.ai and the constructed Python environment takes several GBs. I kept the Dockerfile somewhat standard and readable. When I build the image locally it tells me it has like 2.1 GB. Doing the roundtrip via Dockerhub the same image after a pull is now 6 GB? Weird...

tschm commented 1 year ago

You have the merge power. I am not sure you do yourself a favor with the manual release of the docker image. The pushed image will have no strong link to a tag then (if I understand the manual workflow correctly)...

jonathan-taylor commented 1 year ago

Manual dispatch works fine: jetaylor74/islp_labs should have v2.1.1 and latest

Tried to get it to work on push to stable but not getting triggered. Will eventually sort it out.