Closed mrocklin closed 6 years ago
I have basically never had helm upgrade work reliably. In the past few updates, I always just ended up helm deleting and helm installing the whole cluster.
Sent from my iPhone
On May 19, 2018, at 7:00 PM, Matthew Rocklin notifications@github.com wrote:
This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.
In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.
I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?
You can view, comment on, or merge this pull request online at:
https://github.com/pangeo-data/helm-chart/pull/29
Commit Summary
Update versions of dask projects File Changes
M docker-images/notebook/Dockerfile (43) M docker-images/notebook/worker-template.yaml (2) M docker-images/worker/Dockerfile (12) M pangeo/values.yaml (2) Patch Links:
https://github.com/pangeo-data/helm-chart/pull/29.patch https://github.com/pangeo-data/helm-chart/pull/29.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Should I purge, or just helm delete followed by helm install?
On Sat, May 19, 2018 at 8:47 PM, Ryan Abernathey notifications@github.com wrote:
I have basically never had helm upgrade work reliably. In the past few updates, I always just ended up helm deleting and helm installing the whole cluster.
Sent from my iPhone
On May 19, 2018, at 7:00 PM, Matthew Rocklin notifications@github.com wrote:
This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.
In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.
I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?
You can view, comment on, or merge this pull request online at:
https://github.com/pangeo-data/helm-chart/pull/29
Commit Summary
Update versions of dask projects File Changes
M docker-images/notebook/Dockerfile (43) M docker-images/notebook/worker-template.yaml (2) M docker-images/worker/Dockerfile (12) M pangeo/values.yaml (2) Patch Links:
https://github.com/pangeo-data/helm-chart/pull/29.patch https://github.com/pangeo-data/helm-chart/pull/29.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-390446672, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszD91smYmEkLlgRa9COqjdqa0DiKnks5t0L0xgaJpZM4UF4r1 .
This is what I did for the most recent update.
helm delete jupyter --purge
# wait for the pods to shut down
helm install pangeo/pangeo --version=0.1.1-a14d55b --name=jupyter --namespace=pangeo -f actual-secret-config.yaml -f jupyter-config.yaml
Matt if you merge the latest master, we should be able to try out using chartpress to build the notebook docker image. (But not yet for the worker image.)
Rebased on master.
we should be able to try out using chartpress to build the notebook docker image
What should I be doing to test this? I don't have direct experience with chartpress.
I would just watch the travis log to see what happens. Hopefully it will build a docker image.
Looks like the docker build failed at the conda install stage 😑
This conda issue seems relevant to the failing travis docker build: https://github.com/conda/conda/issues/6811
Hi Matt...what's the status of this? I have to admit that I'm pretty eager to try out intake. How can we help resolve whatever is blocking this from moving forward?
If the travis CI stuff has gummed up the works here, then just disable it for now. The important thing for now is to get the packages updated appropriately.
I've just been busy with other things.
I don't know what the right way to handle image tags and the helm chart any longer. I'd be more than happy to deploy this manually as I've done before but I'm not sure if that mucks up whatever the current system is. I'm not sure I have enough information about what the current procedure is with chartpress to move forward. I'm happy to regress to my old manual way of doing things though.
On Tue, May 22, 2018 at 8:43 PM, Ryan Abernathey notifications@github.com wrote:
Hi Matt...what's the status of this. I have to admit that I'm pretty eager to try out intake. How can we help resolve whatever is blocking this from moving forward?
If the travis CI stuff has gummed up the works here, then just disable it for now. The important thing for now is to get the packages updated appropriately.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391184930, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNyWMKKRUCek-5ysVrC7j9qdIFkgks5t1LDFgaJpZM4UF4r1 .
Matt, there is no "current procedure." I, by myself, am experimentally trying to automate the building of docker images using chartpress. Everything I know about chartpress I learned from its very short readme. This PR is the first one to modify the docker image since these four lines were added to chartpress.yaml
:
images:
notebook:
contextPath: docker-images/notebook
valuesPath: jupyterhub.singleuser.image
What chartpress should do is:
docker-images/notebook
valuesPath
section of the chart (here jupyterhub.singleuser.image
) in values.yaml
with the commit hashAt this point, if all goes well, you should be able to do helm repo update
and install the new chart (including the new docker image)
However, this did not work because conda encountered an error on travis during the docker build step. So, at this point, you have at least two options:
chartpress.yaml
Regarding the naming of docker tags, @yuvipanda has strongly advocated for using commit hashes, and since my recent struggles with updating the cluster, I understand why. This is also what chartpress should do automatically. So whatever you do going forward, best to use commit hashes for your tags.
Also note that there is no auto-building of the worker image yet. So, whatever you decide about chartpress vs. manual build, you will still have to manually build, tag, and push the worker image.
Sorry to have made you the guinea pig for the chartpress stuff. If you don't have bandwidth to deal with this, just disable it. But keep in mind that, if we can get this working, it means less work overall down the line.
Yeah, I'm totally behind the effort to automate the deployment process. I agree that this will reduce the friction to making changes. However, this definitely isn't the week that I would personally choose to dive into this topic.
However, I will have the pleasure of sitting down with @yuvipanda starting Friday, and I know that automation is something that he plans to focus on the following week within the Jupyter tooling, so maybe this would be a good case study to consider.
In the mean time would you like me to handle things manually or do you want to use this as a case study for chartpress?
Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml
.
Which lines?
On Wed, May 23, 2018 at 7:39 AM, Ryan Abernathey notifications@github.com wrote:
Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316105, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDbmdx5nYeyIPmme8YMYeebrgi3Uks5t1UpzgaJpZM4UF4r1 .
I'm also planning to reinstall the chart manually. Any objection?
On Wed, May 23, 2018 at 7:40 AM, Matthew Rocklin mrocklin@anaconda.com wrote:
Which lines?
On Wed, May 23, 2018 at 7:39 AM, Ryan Abernathey <notifications@github.com
wrote:
Understand completely. 👍 Fine to do manual build of docker images for now. You will have to comment out those lines of chartpress.yaml.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316105, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDbmdx5nYeyIPmme8YMYeebrgi3Uks5t1UpzgaJpZM4UF4r1 .
If you are still having upgrade problems, you could try @tjcrone's method:
helm upgrade --force --recreate-pods jupyter pangeo/pangeo --version=0.1.1-a14d55b \
-f secret-config.yaml -f jupyter-config.yaml
These lines of chartpress.yaml
:
images:
notebook:
contextPath: docker-images/notebook
valuesPath: jupyterhub.singleuser.image
I'm also planning to reinstall the chart manually. Any objection?
Can you be more verbose about what you mean by this?
I'll try the helm upgrade you mention above
On Wed, May 23, 2018 at 7:42 AM, Ryan Abernathey notifications@github.com wrote:
I'm also planning to reinstall the chart manually. Any objection?
Can you be more verbose about what you mean by this?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391316720, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGX8viTRkBwuRktHklHKpCNI70pIks5t1UsTgaJpZM4UF4r1 .
I don't know what to do about the docker failures. docker build --no-cache
passes well on my machine. Does chartpress introduce anything custom here?
I tried @tjcrone 's solution with no luck. I'll try @rabernat 's delete-then-install solution in a while. Currently there are some users doing actually work.
Can you document the problem you are having with helm upgrade
?
The upgrade seems to go alright. When I log in I get a 404 error. I can navigate instead to /tree to get the classic notebook and I'm able to log in fine, but kernels don't start.
On Wed, May 23, 2018 at 9:10 AM, Ryan Abernathey notifications@github.com wrote:
Can you document the problem you are having with helm upgrade?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391340878, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFXIf9Ner-MTqbKp2MuoPzB94Jq6ks5t1V_cgaJpZM4UF4r1 .
Ok, that's the exact same problem we had in https://github.com/pangeo-data/pangeo/pull/261 when trying to do a helm upgrade.
Correct
On Wed, May 23, 2018 at 9:26 AM, Ryan Abernathey notifications@github.com wrote:
Ok, that's the exact same problem we had in pangeo-data/pangeo#261 https://github.com/pangeo-data/pangeo/pull/261 when trying to do a helm upgrade.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-391345902, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFmZFqfByANI3pzoP9KBuZ2jxflqks5t1WOegaJpZM4UF4r1 .
@mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.
I think I tried upgrading things but didn't have any luck. I haven't had a sufficiently long stretch of time to try installing it. I feel the need to have more than a couple hours blocked off in case things fail and I need to resolve tricky bugs. Unfortunately I am unlikely to have that much time free in a single block before this Thursday.
On Mon, Jun 4, 2018 at 11:12 AM, Ryan Abernathey notifications@github.com wrote:
@mrocklin https://github.com/mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-394390553, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN6FFfOV4RDuWDc0sKPSQA6w23Aks5t5U5ugaJpZM4UF4r1 .
My apologies for my absence lately
On Mon, Jun 4, 2018 at 11:14 AM, Matthew Rocklin mrocklin@anaconda.com wrote:
I think I tried upgrading things but didn't have any luck. I haven't had a sufficiently long stretch of time to try installing it. I feel the need to have more than a couple hours blocked off in case things fail and I need to resolve tricky bugs. Unfortunately I am unlikely to have that much time free in a single block before this Thursday.
On Mon, Jun 4, 2018 at 11:12 AM, Ryan Abernathey <notifications@github.com
wrote:
@mrocklin https://github.com/mrocklin would you be able to give us an update on where this PR stands? We have the NYC Pangeo sprint happening tomorrow, and it would be good to have this resolved so we can tackle new issues.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-394390553, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN6FFfOV4RDuWDc0sKPSQA6w23Aks5t5U5ugaJpZM4UF4r1 .
@martindurant if you have any interest this issue might be good to work on. Deploying new versions of software on pangeo.pydata.org has become slow recently due to an inability to use helm upgrade
well, and also possible mismatched versions of JupyterLab.
Probably the right approach here is to deploy onto a test setup for a while to make sure that we get versions right, and then upgrade the chart running on the main deployment. If patterns hold true you'll run into a few problems when trying to start jupyter kernels with the docker image in this PR (though more recent work may have superceded these problems) and also run into problems when trying to upgrade the helm chart.
We have a reasonably up-to-date deployment guide on the new site: http://pangeo-data.org/setup_guides/cloud.html
On Mon, Jun 11, 2018 at 2:51 PM, Matthew Rocklin notifications@github.com wrote:
@martindurant https://github.com/martindurant if you have any interest this issue might be good to work on. Deploying new versions of software on pangeo.pydata.org has become slow recently due to an inability to use helm upgrade well, and also possible mismatched versions of JupyterLab.
Probably the right approach here is to deploy onto a test setup for a while to make sure that we get versions right, and then upgrade the chart running on the main deployment. If patterns hold true you'll run into a few problems when trying to start jupyter kernels with the docker image in this PR (though more recent work may have superceded these problems) and also run into problems when trying to upgrade the helm chart.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396347559, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJjpBhOG3u0sP3AHNpxNgVQUaXZhuks5t7rxFgaJpZM4UF4r1 .
Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update
failed.
No. I suspect that that is the right thing to try next.
On Tue, Jun 12, 2018 at 10:08 PM, Ryan Abernathey notifications@github.com wrote:
Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update failed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396790670, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGfMNJQKukZiXF_YXstpCsC3s-eRks5t8HQ6gaJpZM4UF4r1 .
Just to be clear, this stalled PR shouldn't stop others from updating the cluster. I more or less abandoned this over the last few weeks. That shouldn't stop others from progressing on this issue separately.
On Tue, Jun 12, 2018 at 10:11 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
No. I suspect that that is the right thing to try next.
On Tue, Jun 12, 2018 at 10:08 PM, Ryan Abernathey < notifications@github.com> wrote:
Has anyone tried just deleting the cluster and installing it fresh? That has worked in the past when helm update failed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-396790670, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGfMNJQKukZiXF_YXstpCsC3s-eRks5t8HQ6gaJpZM4UF4r1 .
I'm working on this now
There is a test deployment up here with these changes: http://35.232.238.22
I would appreciate it if people could give it a spin. It doesn't have all of the configuration present in the pangeo.pydata.org config file, only what's present in this helm chart
Thanks a lot for picking this up again Matt!
I just tried it out. I am unable to load zarr datasets via gcsfs
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt')
ds = xr.open_zarr(gcsmap)
_call exception: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/conda/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/conda/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/opt/conda/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 120, in __call__
**kwargs)
File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 96, in refresh
self._retrieve_info(request)
File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 78, in _retrieve_info
service_account=self._service_account_email)
File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/_metadata.py", line 179, in get_service_account_info
recursive=True)
File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/_metadata.py", line 115, in get
response = request(url=url, method='GET', headers=_METADATA_HEADERS)
File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 124, in __call__
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 458, in _call
r = meth(self.base + path, params=kwargs, json=json)
File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
return self.request('GET', url, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/google/auth/transport/requests.py", line 198, in request
self._auth_request, method, url, request_headers)
File "/opt/conda/lib/python3.6/site-packages/google/auth/credentials.py", line 122, in before_request
self.refresh(request)
File "/opt/conda/lib/python3.6/site-packages/google/auth/compute_engine/credentials.py", line 102, in refresh
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f322cb2db38>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Perhaps related to the lack of custom config?
Perhaps related to the lack of custom config?
Looking at the jupyter-config.yaml file I'm not seeing anything very obvious here. You may have more experience here than I do. Do you have any ideas on what this might be?
Also, this cell worked for me
import xarray as xr
### Load with FUSE File system
# ds = xr.open_mfdataset('/gcs/newmann-met-ensemble-netcdf/conus_ens_00*.nc',
# engine='netcdf4', concat_dim='ensemble', chunks={'time': 50})
### Load with Cloud object storage
import gcsfs
fs = gcsfs.GCSFileSystem(project='pangeo-181919', token='anon', access='read_only')
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/newman-met-ensemble',
gcs=fs, check=False, create=False)
ds = xr.open_zarr(gcsmap)
I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.
This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant will recognize it.
Oh now I remember. It had has something to do with an upstream change in zero-to-jupyterhub which blocked access to the metadata server.
One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.
On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey notifications@github.com wrote:
I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.
This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .
OK, so we can probably safely ignore it for now
On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.
On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey <notifications@github.com
wrote:
I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.
This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .
I'm going to leave this cluster up until the end of the workday east coast time, then I'll probably install it on the normal cluster.
On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
OK, so we can probably safely ignore it for now
On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.
On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey < notifications@github.com> wrote:
I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.
This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .
Assuming nothing else comes up
On Wed, Jun 13, 2018 at 1:37 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
I'm going to leave this cluster up until the end of the workday east coast time, then I'll probably install it on the normal cluster.
On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
OK, so we can probably safely ignore it for now
On Wed, Jun 13, 2018 at 1:36 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
One of the standard xarray gcs notebooks runs always through for me. I didn't try running it exhaustively though.
On Wed, Jun 13, 2018 at 1:34 PM, Ryan Abernathey < notifications@github.com> wrote:
I should add that, after I submitted my comments, the cell did actually execute and return the correct dataset.
This issue is ringing a bell in my memory. Something similar came up at some point, but I can't track down the issue. Maybe @martindurant https://github.com/martindurant will recognize it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397021997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPN16FfgI6LC4HDVAPzJw_HafccCks5t8U0LgaJpZM4UF4r1 .
So you did not get this error? That's weird. What if you run the sea-surface-height notebook?
So you did not get this error? That's weird.
Yes, things worked for me.
What if you run the sea-surface-height notebook?
I've moved on for the moment. Given that you've found the source of the problem I'm inclined to just let this lie.
If I use
gcs = gcsfs.GCSFileSystem(token='anon')
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs)
ds = xr.open_zarr(gcsmap)
I don't get the error. The key is token='anon'
. I'm not sure why this is different now. But it's not a dealbreaker.
This was fixed if we use the extra config value though?
On Wed, Jun 13, 2018 at 4:09 PM, Ryan Abernathey notifications@github.com wrote:
If I use
gcs = gcsfs.GCSFileSystem(token='anon') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs) ds = xr.open_zarr(gcsmap)
I don't get the error. The key is token='anon'. I'm not sure why this is different now. But it's not a dealbreaker.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397069058, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszBADDWyB4TbXO7fC7U7S5ii_hD-Aks5t8XFvgaJpZM4UF4r1 .
To be clear, I haven't included the pangeo.pydata.org config file in this, so what you're playing with is not the full deployment.
On Wed, Jun 13, 2018 at 4:13 PM, Matthew Rocklin mrocklin@anaconda.com wrote:
This was fixed if we use the extra config value though?
On Wed, Jun 13, 2018 at 4:09 PM, Ryan Abernathey <notifications@github.com
wrote:
If I use
gcs = gcsfs.GCSFileSystem(token='anon') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt', gcs=gcs) ds = xr.open_zarr(gcsmap)
I don't get the error. The key is token='anon'. I'm not sure why this is different now. But it's not a dealbreaker.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/29#issuecomment-397069058, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszBADDWyB4TbXO7fC7U7S5ii_hD-Aks5t8XFvgaJpZM4UF4r1 .
This was fixed if we use the extra config value though?
Correct.
I have tested out other stuff (holoviews, datashader, and it seems to be working.)
so 👍 from me to deploy this to pangeo.pydata.org!
This points to a consistent set of commits for dask/dask, dask/distributed, and dask/dask-kubernetes.
In going through this I ran into a number of problems with Jupyter not starting up well. I tried to diagnose them by following github issues and pinned a few extra versions of libraries like pyzmq, but in the end this was not effective.
I'm not sure how best to address the failure here. @rabernat @tjcrone was there a general recommendation on how to upgrade the cluster?