pangeo-data / foss4g-2022

Pangeo tutorial at FOSS4G 2022
https://pangeo-data.github.io/foss4g-2022
Other
2 stars 9 forks source link

Dask introduction #47

Closed pl-marasco closed 2 years ago

pl-marasco commented 2 years ago

I've lost a little bit the track about the infrastructure but, to reproduce the error @j34ni is facing, I started to run the notebook.

Once I try to create the cluster I get ClientResponseError: 401, message='Unauthorized', url=URL('http://api-daskhub-dask-gateway.daskhub:8000/api/v1/clusters/') Does anyone know the reason? Testing the code without it doesn't reproduce the error he is facing.

j34ni commented 2 years ago

@pl-marasco Which infrastructure was that on?

pl-marasco commented 2 years ago

@pl-marasco Which infrastructure was that on?

I followed the link in the setup page.

That is pointing to : https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/hub/user-redirect/git-pull?repo=https%3A//github.com/pangeo-data/foss4g-2022&urlpath=lab/tree/foss4g-2022/tutorial/pangeo101/&branch=main

j34ni commented 2 years ago

@pl-marasco There is currently 1 daskhub running on pangeo-foss and another one pending

Screenshot from 2022-08-12 12-21-43

I guess that all the resources available are being used.

pl-marasco commented 2 years ago

I'm not too familiar with the daskGatway but there should be a way to use the cluster that is running. Am I right? In any case, I would add a note in the notebook to let attendees to know how to act in case this happens. Right now I'm not even able to get access to the infrastructure for lack of resources; may be there is an on going refractory.

image

@j34ni the issue you mention in your merge request is related to the .plot() method or the .hvmethod ?

j34ni commented 2 years ago

@pl-marasco It is probably better if each user use their own cluster Will try to use the new VM flavor with 16 vCPUs and 64GB RAM

pl-marasco commented 2 years ago

Menwhile @j34ni meanwhile I'll be able to reconnect to the cluster, can you give me a little bit more info regarding the problem you are facing? maybe I have a solution as I was struggling for a few days with the hvplot in the same exact point.

j34ni commented 2 years ago

@pl-marasco The issue was that the VCI plot was not done straight away in the same notebook, however I was able to save the data as a netCDF file and then plot it with a different notebook. Also VCI values were not in the range of 0 to 100 (or thereabout) but something like -4000 to +4000. So I started by reducing the time range to the months for which NDVI existed (instead of the whole year), and then may be would it make sense to update the NDVI min & max taking into account 2022 values? What did you get with hvplot?

pl-marasco commented 2 years ago

@j34ni In my case, the hvplot fails to render once selecting a different date.
To solve the issue a filter is needed to remove the +/- np.inf values (is the same if you limit values in an acceptable range) and force the x and y axis definition.

VCS.hvplot(x='lon', y='lat', groupby='time' ...

regarding time slicing, even if you have a point on this (especially in the vision of reducing the memory needs), the intention was to talk about/demonstrate the automatic alignment of datasets.

IMHO we should avoid the two steps approach (save/re-read in another notebook) unless there is a note explicating the fact that's a workaround.

j34ni commented 2 years ago

@pl-marasco Obviously the save-re-read approached was never intended to remain, it was only to check what was going on I will try what you suggested

annefou commented 2 years ago

Do we still need this issue? Can we close it?

pl-marasco commented 2 years ago

Infrastructure is now working and the visualization component will be updated soon to allow hvplot to work properly.