pangeo-data / helm-chart

Pangeo helm charts
https://pangeo-data.github.io/helm-chart/
21 stars 26 forks source link

Update versions of dask projects #29

Closed mrocklin closed 6 years ago

mrocklin commented 6 years ago

This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.

In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.

I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?

rabernat commented 6 years ago

I have basically never had helm upgrade work reliably. In the past few updates, I always just ended up helm deleting and helm installing the whole cluster.

Sent from my iPhone

On May 19, 2018, at 7:00 PM, Matthew Rocklin notifications@github.com wrote:

This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.

In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.

I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?

You can view, comment on, or merge this pull request online at:

https://github.com/pangeo-data/helm-chart/pull/29

Commit Summary

Update versions of dask projects File Changes

M docker-images/notebook/Dockerfile (43) M docker-images/notebook/worker-template.yaml (2) M docker-images/worker/Dockerfile (12) M pangeo/values.yaml (2) Patch Links:

https://github.com/pangeo-data/helm-chart/pull/29.patch https://github.com/pangeo-data/helm-chart/pull/29.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mrocklin commented 6 years ago

Should I purge, or just helm delete followed by helm install?

On Sat, May 19, 2018 at 8:47 PM, Ryan Abernathey notifications@github.com wrote:

I have basically never had helm upgrade work reliably. In the past few updates, I always just ended up helm deleting and helm installing the whole cluster.

Sent from my iPhone

On May 19, 2018, at 7:00 PM, Matthew Rocklin notifications@github.com wrote:

This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.

In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.

I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?

You can view, comment on, or merge this pull request online at:

https://github.com/pangeo-data/helm-chart/pull/29

Commit Summary

Update versions of dask projects File Changes

M docker-images/notebook/Dockerfile (43) M docker-images/notebook/worker-template.yaml (2) M docker-images/worker/Dockerfile (12) M pangeo/values.yaml (2) Patch Links:

https://github.com/pangeo-data/helm-chart/pull/29.patch https://github.com/pangeo-data/helm-chart/pull/29.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-390446672, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszD91smYmEkLlgRa9COqjdqa0DiKnks5t0L0xgaJpZM4UF4r1 .

rabernat commented 6 years ago

This is what I did for the most recent update.

helm delete jupyter --purge
# wait for the pods to shut down
helm install pangeo/pangeo --version=0.1.1-a14d55b --name=jupyter --namespace=pangeo -f actual-secret-config.yaml -f jupyter-config.yaml
rabernat commented 6 years ago

Matt if you merge the latest master, we should be able to try out using chartpress to build the notebook docker image. (But not yet for the worker image.)

mrocklin commented 6 years ago

Rebased on master.

we should be able to try out using chartpress to build the notebook docker image

What should I be doing to test this? I don't have direct experience with chartpress.

rabernat commented 6 years ago

I would just watch the travis log to see what happens. Hopefully it will build a docker image.

rabernat commented 6 years ago

Looks like the docker build failed at the conda install stage 😑

rabernat commented 6 years ago

This conda issue seems relevant to the failing travis docker build: https://github.com/conda/conda/issues/6811

rabernat commented 6 years ago

Hi Matt...what's the status of this? I have to admit that I'm pretty eager to try out intake. How can we help resolve whatever is blocking this from moving forward?

If the travis CI stuff has gummed up the works here, then just disable it for now. The important thing for now is to get the packages updated appropriately.

mrocklin commented 6 years ago

I've just been busy with other things.

I don't know what the right way to handle image tags and the helm chart any longer. I'd be more than happy to deploy this manually as I've done before but I'm not sure if that mucks up whatever the current system is. I'm not sure I have enough information about what the current procedure is with chartpress to move forward. I'm happy to regress to my old manual way of doing things though.

On Tue, May 22, 2018 at 8:43 PM, Ryan Abernathey notifications@github.com wrote:

Hi Matt...what's the status of this. I have to admit that I'm pretty eager to try out intake. How can we help resolve whatever is blocking this from moving forward?

If the travis CI stuff has gummed up the works here, then just disable it for now. The important thing for now is to get the packages updated appropriately.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391184930, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNyWMKKRUCek-5ysVrC7j9qdIFkgks5t1LDFgaJpZM4UF4r1 .

rabernat commented 6 years ago

Matt, there is no "current procedure." I, by myself, am experimentally trying to automate the building of docker images using chartpress. Everything I know about chartpress I learned from its very short readme. This PR is the first one to modify the docker image since these four lines were added to chartpress.yaml:

    images:
      notebook:
        contextPath: docker-images/notebook
        valuesPath: jupyterhub.singleuser.image

What chartpress should do is:

At this point, if all goes well, you should be able to do helm repo update and install the new chart (including the new docker image)

However, this did not work because conda encountered an error on travis during the docker build step. So, at this point, you have at least two options:

Regarding the naming of docker tags, @yuvipanda has strongly advocated for using commit hashes, and since my recent struggles with updating the cluster, I understand why. This is also what chartpress should do automatically. So whatever you do going forward, best to use commit hashes for your tags.

Also note that there is no auto-building of the worker image yet. So, whatever you decide about chartpress vs. manual build, you will still have to manually build, tag, and push the worker image.

Sorry to have made you the guinea pig for the chartpress stuff. If you don't have bandwidth to deal with this, just disable it. But keep in mind that, if we can get this working, it means less work overall down the line.

mrocklin commented 6 years ago

Yeah, I'm totally behind the effort to automate the deployment process. I agree that this will reduce the friction to making changes. However, this definitely isn't the week that I would personally choose to dive into this topic.

However, I will have the pleasure of sitting down with @yuvipanda starting Friday, and I know that automation is something that he plans to focus on the following week within the Jupyter tooling, so maybe this would be a good case study to consider.

In the mean time would you like me to handle things manually or do you want to use this as a case study for chartpress?

rabernat commented 6 years ago

Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml.

mrocklin commented 6 years ago

Which lines?

On Wed, May 23, 2018 at 7:39 AM, Ryan Abernathey notifications@github.com wrote:

Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316105, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDbmdx5nYeyIPmme8YMYeebrgi3Uks5t1UpzgaJpZM4UF4r1 .

mrocklin commented 6 years ago

I'm also planning to reinstall the chart manually. Any objection?

On Wed, May 23, 2018 at 7:40 AM, Matthew Rocklin mrocklin@anaconda.com wrote:

Which lines?

On Wed, May 23, 2018 at 7:39 AM, Ryan Abernathey <notifications@github.com

wrote:

Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316105, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDbmdx5nYeyIPmme8YMYeebrgi3Uks5t1UpzgaJpZM4UF4r1 .

rabernat commented 6 years ago

If you are still having upgrade problems, you could try @tjcrone's method:

helm upgrade --force --recreate-pods jupyter pangeo/pangeo --version=0.1.1-a14d55b \
    -f secret-config.yaml -f jupyter-config.yaml
rabernat commented 6 years ago

These lines of chartpress.yaml:

    images:
      notebook:
        contextPath: docker-images/notebook
        valuesPath: jupyterhub.singleuser.image
rabernat commented 6 years ago

I'm also planning to reinstall the chart manually. Any objection?

Can you be more verbose about what you mean by this?

mrocklin commented 6 years ago

I'll try the helm upgrade you mention above

On Wed, May 23, 2018 at 7:42 AM, Ryan Abernathey notifications@github.com wrote:

I'm also planning to reinstall the chart manually. Any objection?

Can you be more verbose about what you mean by this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316720, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGX8viTRkBwuRktHklHKpCNI70pIks5t1UsTgaJpZM4UF4r1 .

mrocklin commented 6 years ago

I don't know what to do about the docker failures. docker build --no-cache passes well on my machine. Does chartpress introduce anything custom here?

mrocklin commented 6 years ago

I tried @tjcrone 's solution with no luck. I'll try @rabernat 's delete-then-install solution in a while. Currently there are some users doing actually work.

rabernat commented 6 years ago

Can you document the problem you are having with helm upgrade?

mrocklin commented 6 years ago

The upgrade seems to go alright. When I log in I get a 404 error. I can navigate instead to /tree to get the classic notebook and I'm able to log in fine, but kernels don't start.

On Wed, May 23, 2018 at 9:10 AM, Ryan Abernathey notifications@github.com wrote:

Can you document the problem you are having with helm upgrade?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391340878, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFXIf9Ner-MTqbKp2MuoPzB94Jq6ks5t1V_cgaJpZM4UF4r1 .

rabernat commented 6 years ago

Ok, that's the exact same problem we had in https://github.com/pangeo-data/pangeo/pull/261 when trying to do a helm upgrade.

mrocklin commented 6 years ago

Correct

On Wed, May 23, 2018 at 9:26 AM, Ryan Abernathey notifications@github.com wrote:

Ok, that's the exact same problem we had in pangeo-data/pangeo#261 https://github.com/pangeo-data/pangeo/pull/261 when trying to do a helm upgrade.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391345902, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFmZFqfByANI3pzoP9KBuZ2jxflqks5t1WOegaJpZM4UF4r1 .

rabernat commented 6 years ago

@mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.

mrocklin commented 6 years ago

I think I tried upgrading things but didn't have any luck. I haven't had a sufficiently long stretch of time to try installing it. I feel the need to have more than a couple hours blocked off in case things fail and I need to resolve tricky bugs. Unfortunately I am unlikely to have that much time free in a single block before this Thursday.

On Mon, Jun 4, 2018 at 11:12 AM, Ryan Abernathey notifications@github.com wrote:

@mrocklin https://github.com/mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-394390553, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN6FFfOV4RDuWDc0sKPSQA6w23Aks5t5U5ugaJpZM4UF4r1 .

mrocklin commented 6 years ago

My apologies for my absence lately

On Mon, Jun 4, 2018 at 11:14 AM, Matthew Rocklin mrocklin@anaconda.com wrote:

I think I tried upgrading things but didn't have any luck. I haven't had a sufficiently long stretch of time to try installing it. I feel the need to have more than a couple hours blocked off in case things fail and I need to resolve tricky bugs. Unfortunately I am unlikely to have that much time free in a single block before this Thursday.

On Mon, Jun 4, 2018 at 11:12 AM, Ryan Abernathey <notifications@github.com

wrote:

@mrocklin https://github.com/mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-394390553, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN6FFfOV4RDuWDc0sKPSQA6w23Aks5t5U5ugaJpZM4UF4r1 .

mrocklin commented 6 years ago

@martindurant if you have any interest this issue might be good to work on. Deploying new versions of software on pangeo.pydata.org has become slow recently due to an inability to use helm upgrade well, and also possible mismatched versions of JupyterLab.

Probably the right approach here is to deploy onto a test setup for a while to make sure that we get versions right, and then upgrade the chart running on the main deployment. If patterns hold true you'll run into a few problems when trying to start jupyter kernels with the docker image in this PR (though more recent work may have superceded these problems) and also run into problems when trying to upgrade the helm chart.

rabernat commented 6 years ago

We have a reasonably up-to-date deployment guide on the new site: http://pangeo-data.org/setup_guides/cloud.html

On Mon, Jun 11, 2018 at 2:51 PM, Matthew Rocklin notifications@github.com wrote:

@martindurant https://github.com/martindurant if you have any interest this issue might be good to work on. Deploying new versions of software on pangeo.pydata.org has become slow recently due to an inability to use helm upgrade well, and also possible mismatched versions of JupyterLab.

Probably the right approach here is to deploy onto a test setup for a while to make sure that we get versions right, and then upgrade the chart running on the main deployment. If patterns hold true you'll run into a few problems when trying to start jupyter kernels with the docker image in this PR (though more recent work may have superceded these problems) and also run into problems when trying to upgrade the helm chart.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396347559, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJjpBhOG3u0sP3AHNpxNgVQUaXZhuks5t7rxFgaJpZM4UF4r1 .

rabernat commented 6 years ago

Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update failed.

mrocklin commented 6 years ago

No. I suspect that that is the right thing to try next.

On Tue, Jun 12, 2018 at 10:08 PM, Ryan Abernathey notifications@github.com wrote:

Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update failed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396790670, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGfMNJQKukZiXF_YXstpCsC3s-eRks5t8HQ6gaJpZM4UF4r1 .

mrocklin commented 6 years ago

Just to be clear, this stalled PR shouldn't stop others from updating the cluster. I more or less abandoned this over the last few weeks. That shouldn't stop others from progressing on this issue separately.

On Tue, Jun 12, 2018 at 10:11 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

No. I suspect that that is the right thing to try next.

On Tue, Jun 12, 2018 at 10:08 PM, Ryan Abernathey < notifications@github.com> wrote:

Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update failed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396790670, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGfMNJQKukZiXF_YXstpCsC3s-eRks5t8HQ6gaJpZM4UF4r1 .

mrocklin commented 6 years ago

I'm working on this now

mrocklin commented 6 years ago

There is a test deployment up here with these changes: http://35.232.238.22

I would appreciate it if people could give it a spin. It doesn't have all of the configuration present in the pangeo.pydata.org config file, only what's present in this helm chart

rabernat commented 6 years ago

Thanks a lot for picking this up again Matt!

I just tried it out. I am unable to load zarr datasets via gcsfs

gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt')
ds = xr.open_zarr(gcsmap)
_call exception: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/conda/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/opt/conda/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 120, in __call__
    **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 96, in refresh
    self._retrieve_info(request)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 78, in _retrieve_info
    service_account=self._service_account_email)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/_metadata.py", line 179, in get_service_account_info
    recursive=True)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/_metadata.py", line 115, in get
    response = request(url=url, method='GET', headers=_METADATA_HEADERS)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 124, in __call__
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 458, in _call
    r = meth(self.base + path, params=kwargs, json=json)
  File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 198, in request
    self._auth_request, method, url, request_headers)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/credentials.py", line 122, in before_request
    self.refresh(request)
  File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 102, in refresh
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))

Perhaps related to the lack of custom config?

mrocklin commented 6 years ago

Perhaps related to the lack of custom config?

Looking at the jupyter-config.yaml file I'm not seeing anything very obvious here. You may have more experience here than I do. Do you have any ideas on what this might be?

mrocklin commented 6 years ago

Also, this cell worked for me

import xarray as xr

### Load with FUSE File system
# ds = xr.open_mfdataset('/gcs/newmann-met-ensemble-netcdf/conus_ens_00*.nc',
#                        engine='netcdf4', concat_dim='ensemble', chunks={'time': 50})

### Load with Cloud object storage
import gcsfs

fs = gcsfs.GCSFileSystem(project='pangeo-181919', token='anon', access='read_only')
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/newman-met-ensemble',
                              gcs=fs, check=False, create=False)
ds = xr.open_zarr(gcsmap)
rabernat commented 6 years ago

I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.

This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant will recognize it.

rabernat commented 6 years ago

Oh now I remember. It had has something to do with an upstream change in zero-to-jupyterhub which blocked access to the metadata server.

This one! https://github.com/pangeo-data/pangeo/issues/148

mrocklin commented 6 years ago

One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.

On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey notifications@github.com wrote:

I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.

This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .

mrocklin commented 6 years ago

OK, so we can probably safely ignore it for now

On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.

On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey <notifications@github.com

wrote:

I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.

This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .

mrocklin commented 6 years ago

I'm going to leave this cluster up until the end of the workday east coast time, then I'll probably install it on the normal cluster.

On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

OK, so we can probably safely ignore it for now

On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.

On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey < notifications@github.com> wrote:

I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.

This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .

mrocklin commented 6 years ago

Assuming nothing else comes up

On Wed, Jun 13, 2018 at 1:37 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

I'm going to leave this cluster up until the end of the workday east coast time, then I'll probably install it on the normal cluster.

On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

OK, so we can probably safely ignore it for now

On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.

On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey < notifications@github.com> wrote:

I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.

This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .

rabernat commented 6 years ago

So you did not get this error? That's weird. What if you run the sea-surface-height notebook?

mrocklin commented 6 years ago

So you did not get this error? That's weird.

Yes, things worked for me.

What if you run the sea-surface-height notebook?

I've moved on for the moment. Given that you've found the source of the problem I'm inclined to just let this lie.

rabernat commented 6 years ago

If I use

gcs = gcsfs.GCSFileSystem(token='anon')
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs)
ds = xr.open_zarr(gcsmap)

I don't get the error. The key is token='anon'. I'm not sure why this is different now. But it's not a dealbreaker.

mrocklin commented 6 years ago

This was fixed if we use the extra config value though?

On Wed, Jun 13, 2018 at 4:09 PM, Ryan Abernathey notifications@github.com wrote:

If I use

gcs = gcsfs.GCSFileSystem(token='anon') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs) ds = xr.open_zarr(gcsmap)

I don't get the error. The key is token='anon'. I'm not sure why this is different now. But it's not a dealbreaker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397069058, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszBADDWyB4TbXO7fC7U7S5ii_hD-Aks5t8XFvgaJpZM4UF4r1 .

mrocklin commented 6 years ago

To be clear, I haven't included the pangeo.pydata.org config file in this, so what you're playing with is not the full deployment.

On Wed, Jun 13, 2018 at 4:13 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

This was fixed if we use the extra config value though?

On Wed, Jun 13, 2018 at 4:09 PM, Ryan Abernathey <notifications@github.com

wrote:

If I use

gcs = gcsfs.GCSFileSystem(token='anon') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs) ds = xr.open_zarr(gcsmap)

I don't get the error. The key is token='anon'. I'm not sure why this is different now. But it's not a dealbreaker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397069058, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszBADDWyB4TbXO7fC7U7S5ii_hD-Aks5t8XFvgaJpZM4UF4r1 .

rabernat commented 6 years ago

This was fixed if we use the extra config value though?

Correct.

I have tested out other stuff (holoviews, datashader, and it seems to be working.)

so 👍 from me to deploy this to pangeo.pydata.org!