Open rabernat opened 4 years ago
@rabernat ~I think only the base-image
image supports an environment.yaml
(@scottyhq can confirm?)~ (I think I was incorrect)
Also, dunno if it matters, but it may need to be environment.yml
rather than environment.yaml
.
So if you want stuff I think pangeo-notebook
in your environment.yaml.
Trying this out at https://binder.pangeo.io/v2/gh/TomAugspurger/poseidon-bot/binder / https://github.com/TomAugspurger/poseidon-bot/tree/binder
Also, dunno if it matters, but it may need to be
environment.yml
rather thanenvironment.yaml
.
If this is the reason, 🤦
Thanks for looking into it!
@TomAugspurger @rabernat, sorry I didn't see this issue until now b/c I wasn't 'watching' the repository! I thought that would happen by default.
Is this "onbuild" capability no longer supported?
Correct, no longer supported.
If not, how do we recommend extending the images?
The nearest thing to onbuild is using the base image rather than one of the notebook images so just change your Dockerfile to:
FROM pangeo/base-image:9d0723d
This puts the responsibility on the binder creater to add all the necessary sidecar files. You don't need a lock file, and can just modify environment.yml from here: https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook
@scottyhq with that I see
Checking for 'postBuild'...
/srv/conda/envs/notebook/lib/python3.8/site-packages/traitlets/config/loader.py:795: SyntaxWarning: "is" with aliteral. Did you mean "=="?
if len(key) is 1:
/srv/conda/envs/notebook/lib/python3.8/site-packages/traitlets/config/loader.py:804: SyntaxWarning: "is" with aliteral. Did you mean "=="?
if len(key) is 1:
Enabling: nbgitpuller
- Writing config: /srv/conda/envs/notebook/etc/jupyter
- Validating...
nbgitpuller 0.8.0 OK
rm: cannot remove '/tmp/*': No such file or directory
Removing intermediate container c86e1dad71a0
I think if the postBuild
file doesn't create any files then https://github.com/pangeo-data/pangeo-docker-images/blob/9d0723dbb375fe728a44985e4ae4ae961677890b/base-image/Dockerfile#L115 will fail. Will push a fix shortly.
@rabernat seems to work as of https://github.com/TomAugspurger/poseidon-bot/tree/binder.
https://binder.pangeo.io/v2/gh/TomAugspurger/poseidon-bot/binder.
That at least builds and I can import xgcm
.
https://github.com/TomAugspurger/poseidon-bot/blob/fc78c14e2fb0c87f9aa4cc411a494bd8d0f8d323/postBuild#L6 will be unneeded when https://github.com/pangeo-data/pangeo-docker-images/pull/61 is merged.
Also, just to clarify, if you want all the pangeo-notebook packages + others like xgcm in https://github.com/TomAugspurger/poseidon-bot/tree/binder, you'll need to
1) append your additional packages to a copy of the pangeo-notebook environment.yml in your binder environment.yml (https://github.com/TomAugspurger/poseidon-bot/blob/binder/environment.yml)
2) Add the standard jupyterlab extensions from pangeo-notebook postBuild.yml to your binder postBuild (https://github.com/TomAugspurger/poseidon-bot/blob/binder/postBuild).
I appreciate the quick responses and clarifications. Sorry for being slow to understand how this all fits together.
I miss the ability to extend pangeo-notebook. I thought that was a very useful and convenient way to work. I don't miss keeping track of long environment.yaml files. I hope we can find a way to bring this back somehow.
It is also not clear to me whether I gain anything by including a Dockerfile with FROM pangeo/base-image:9d0723d
. Since I have to enumerate all the packages and add a postBuild anyway, isn't it just simpler to use a normal binder?
It is also not clear to me whether I gain anything by including a Dockerfile with FROM pangeo/base-image:9d0723d. Since I have to enumerate all the packages and add a postBuild anyway, isn't it just simpler to use a normal binder?
You gain a faster build and resulting image that is about 1/2 the size with 1/2 the layers. The key difference is a single conda solve+install instead of installing additional packages into an existing environment. But sure, you can drop the Dockerfile and stick with repo2docker condabuildpack if you prefer.
I miss the ability to extend pangeo-notebook. I thought that was a very useful and convenient way to work. I don't miss keeping track of long environment.yaml files. I hope we can find a way to bring this back somehow.
This was discussed at various points over the last couple months in https://github.com/pangeo-data/pangeo-docker-images/issues/2. The original design had that ability. But essentially there is a tradeoff between the convenience of onbuild layering versus the transparency of explicitly listing what goes into an environment under one folder.
Can we get the best of both worlds by making a pangeo-kitchen-sink
(name TDB) metapackage with the contents of https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml, and then someone wishing to customize that with a few packages has
# Dockerfile
FROM pangeo-base-image:tag
environment.yml
name: pangeo
channels:
- conda-forge
dependencies:
- pangeo-kitchen-sink=2020.04.14
- my-custom-package
@TomAugspurger - I think it could be worthwhile to do that, but it's hard to decide what goes into the kitchen sink... hopefully folks will chime in about that here https://github.com/pangeo-data/pangeo-docker-images/issues/28.
This issue makes my think we should change the label of base-image
to base-image-onbuild
given the best practices on naming (https://docs.docker.com/develop/develop-images/dockerfile_best-practices). Hopefully that would also help clarify that the notebook
images do not have onbuild commands baked into them.
I tried following this advice and creating a new binder based on pangeo-notebook.
It's here:
The environment differs from pangeo-notebook only by about 5 extra packages at the end: https://github.com/pangeo-gallery/cmip6/blob/abd71d5e62c7d8bde5ec22f896846277463009ad/binder/environment.yml#L55-L59
This binder will not build. The conda environment can't be solved. The build ends with about 10000 messages like this:
Package zstd conflicts for:
tiledb-py -> tiledb[version='>=1.7.7,<1.8.0a0'] -> zstd[version='1.3.2|1.3.3|>=1.3.3,<1.3.4.0a0|>=1.3.7,<1.3.8.0a0|>=1.4.0,<1.4.1.0a0|>=1.4.3,<1.4.4.0a0|>=1.4.4,<1.4.5.0a0']
rasterio -> libgdal[version='>=3.0.4,<3.1.0a0'] -> zstd[version='>=1.3.7,<1.3.8.0a0|>=1.4.4,<1.4.5.0a0']
geopandas -> pysal -> zstd
python-blosc -> blosc[version='>=1.16.3,<2.0a0'] -> zstd[version='>=1.3.7,<1.3.8.0a0']
Package wheel conflicts for:
python=3.7 -> pip -> wheel
pip=20 -> wheel
Package backports conflicts for:
numcodecs -> backports.lzma -> backports
matplotlib-base -> backports.functools_lru_cache -> backports
Package markupsafe conflicts for:
pydap -> jinja2 -> markupsafe[version='>=0.23']
intake -> jinja2 -> markupsafe[version='>=0.23']
Note that strict channel priority may have removed packages required for satisfiability.
Not sure how to move forward. Any advice from the repo2docker gurus would be appreciated.
@rabernat I think that's strictly a conda issue. If I had to guess it's the mix of
- pangeo-notebook=2020.04.25
- distributed=2.15.1
since pangeo-notebook pins distributed
exactly (via pangeo-dask), you end up with two different exact pins.
Conda really should be able to provide a better error message, but it's apparently somewhat hard to do generally.
Bumping to pange-notebook=2020.04.28
and removing distributed should do the trick.
Yes, that fixed it! Thanks @TomAugspurger! :+1:
However, the dask dashboard is still inaccessible. I get a 404 error.
Just tried this with pangeo/pangeo-notebook:2020.04.28
on Rich's example notebook on AWS and I'm also seeing 404 trying to connect to the dashboard. Jupyter pod logs show:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
result = await result
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/websocket.py", line 97, in get
return await self.http_get(*args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/handlers.py", line 359, in http_get
return await self.proxy(port, proxied_path)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/handlers.py", line 225, in proxy
response = await client.fetch(req, raise_error=False)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/iostream.py", line 1226, in connect
self.socket.connect(address)
OSError: [Errno 99] Cannot assign requested address
LabApp - ERROR - Uncaught exception GET /user/scottyhq-pangeodev-binder-9xnng5u1/proxy/8787/individual-plots.json?1588175640199 (192.168.59.217)
HTTPServerRequest(protocol='https', host='hub.aws-uswest2-binder.pangeo.io', method='GET', uri='/user/scottyhq-pangeodev-binder-9xnng5u1/proxy/8787/individual-plots.json?1588175640199', version='HTTP/1.1', remote_ip='192.168.59.217')
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/tcpclient.py", line 143, in on_connect_done
stream = future.result()
tornado.iostream.StreamClosedError: Stream is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
result = await result
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/websocket.py", line 97, in get
return await self.http_get(*args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/handlers.py", line 359, in http_get
return await self.proxy(port, proxied_path)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_server_proxy/handlers.py", line 225, in proxy
response = await client.fetch(req, raise_error=False)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/iostream.py", line 1226, in connect
self.socket.connect(address)
OSError: [Errno 99] Cannot assign requested address
This is a helpful command/URL to see the packages changed between tags: https://github.com/pangeo-data/pangeo-docker-images/compare/2020.04.22..2020.04.28#diff-fc143938f9485967d3be2239526ec787
@rabernat I'm guessing there is still an issue with distributed 2.15.1 since tornado and jupyter_server_proxy haven't changed. As a short-term solution can you just drop back to pangeo-notebook=2020.04.22?
just cross referencing https://github.com/conda-forge/pangeo-notebook-feedstock/pull/15 which may be important here soon.
I would like to extend the pangeo-notebook image, as we used to do in the old system. I made the following repo: https://github.com/rabernat/poseidon-bot/tree/binder with the following Dockerfile
plus an environment.yaml file. But it just ignores the environment.yaml file.
Is this "onbuild" capability no longer supported? If not, how do we recommend extending the images?
Binder: https://binder.pangeo.io/v2/gh/rabernat/poseidon-bot/binder