pangeo-data / pangeo-eosc

Pangeo for the European Open Science cloud
https://pangeo-data.github.io/pangeo-eosc/
MIT License
3 stars 3 forks source link

update docker images #79

Open annefou opened 1 month ago

annefou commented 1 month ago

in preparation of the upcoming courses Geo-open-hack and IGARSS 2024

sebastian-luna-valero commented 1 month ago

Thanks @annefou

Please test it out via this ephemeral deployment: https://repro-challenge.vm.fedcloud.eu/

Could you please confirm it works as expected?

annefou commented 1 month ago

Thanks @sebastian-luna-valero I am trying the images and I just noticed that https://github.com/pangeo-data/geo-open-hack-2024/blob/main/docs/chunking_introduction.ipynb does not work anymore.

Do we still have access to the corresponding object storage? We started to use it since FOSS4F 2022:

fs = s3fs.S3FileSystem(anon=True,
      client_kwargs={
         'endpoint_url': 'https://object-store.cloud.muni.cz'
      })

s3path = 's3://foss4g-data/CGLS_LTS_1999_2019/c_gls_NDVI-LTS_1999-2019-1221_GLOBE_VGT-PROBAV_V3.0.1.nc'

%%time
LTS = xr.open_mfdataset([fs.open(s3path)])
LTS

we cannot access anymore.

annefou commented 1 month ago
  1. I had to switch to quay.io for the data science image too. I pushed the changes in this branch, but it does not seem to be effective immediately. Do you need to do anything on your side?

  2. For both ml-notebook and PyTorch-notebook I had to retry a few times because of the timeout. They both eventually started. Should we need to increase the timeout?

2024-06-05T12:28:36Z [Normal] Pulling image "quay.io/pangeo/pytorch-notebook:2024.06.02" Spawn failed: pod c-scale-repro-challenge/jupyter-afouilloux did not start in 600 seconds!

sebastian-luna-valero commented 1 month ago

Yes, I need to apply the changes on my end. I just did, please try again and let me know.

CESNET confirmed an issue with the storage back-end, and I am waiting for a reply.

sebastian-luna-valero commented 1 month ago

The issue with object store at CESNET seems resolved.

Could you please test and confirm?

annefou commented 1 month ago

The issue with object store at CESNET seems resolved.

Could you please test and confirm?

I confirm that the issue with object store is resolved but we are having strange issues on the test deployment and we are not sure if it is related to the new images or the test deployment.

The jupyterhub is very unstable: sometimes when we try to add a cell, it does not add it, or when we try to change a cell type to raw, or markdown, the cell disappears and comes back in a few seconds. This is very weird. @tinaok any other issues on your side?

tinaok commented 1 month ago

The issue with object store at CESNET seems resolved. Could you please test and confirm?

I confirm that the issue with object store is resolved but we are having strange issues on the test deployment and we are not sure if it is related to the new images or the test deployment.

The jupyterhub is very unstable: sometimes when we try to add a cell, it does not add it, or when we try to change a cell type to raw, or markdown, the cell disappears and comes back in a few seconds. This is very weird. @tinaok any other issues on your side?

yes I have same problem, when I try to add one cell, with the former version (pangeo-eosc) it adds the cell instantly but with the reprochallenge version, it does so in a few seconds.

sebastian-luna-valero commented 1 month ago

Does this happen with all images (pangeo-notebook, ml-notebook, pythorch-notebook, datascience-notebook)?

annefou commented 1 month ago

Does this happen with all images (pangeo-notebook, ml-notebook, pythorch-notebook, datascience-notebook)?

I have similar behaviour with other images. Actually today, it is a bit better but I still have strange behaviors, for instance when trying to add a cell. It does not appear and only when I click again in the notebook, it creates a new cell.

sebastian-luna-valero commented 1 month ago

Not sure what's the issue. Services seem stable on the back-end, and I am not able to reproduce the issue.

Does this happen with a particular notebook? if yes, I would like to copy it and give it a try on my end.

annefou commented 1 month ago

Not sure what's the issue. Services seem stable on the back-end, and I am not able to reproduce the issue.

Does this happen with a particular notebook? if yes, I would like to copy it and give it a try on my end.

I don't think it is related to the notebook itself. At the moment, we are trying the notebooks from https://github.com/pangeo-data/geo-open-hack-2024

sebastian-luna-valero commented 1 month ago

I also tried today running/modifying notebooks from https://github.com/pangeo-data/geo-open-hack-2024 but I couldn't reproduce the issue.

I am currently asking CESNET for further checks and will get back to you later today

sebastian-luna-valero commented 1 month ago

The next time that this issue happens again, please let us know immediately and leave your notebook open so we can have a look at the running session to check whether we can see something on the backend explaining this behaviour

sebastian-luna-valero commented 1 month ago

I see no usage currently for https://repro-challenge.vm.fedcloud.eu/

I would like to test an upgrade to DaskHub. Could you please take a copy of what you have in this test deployment, and let me know when is it a good time to proceed?

annefou commented 1 month ago

For me it is fine and you can proceed.

Thanks,

Anne

On Thu, 13 Jun 2024 at 15:22, Sebastian Luna-Valero < @.***> wrote:

I see no usage currently for https://repro-challenge.vm.fedcloud.eu/

I would like to test an upgrade to DaskHub. Could you please take a copy of what you have in this test deployment, and let me know when is it a good time to proceed?

— Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-eosc/pull/79#issuecomment-2165658487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6KIPHGDFTPZ5S5V4XFOLLZHGMHPAVCNFSM6AAAAABI2BGAE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRVGY2TQNBYG4 . You are receiving this because you were mentioned.Message ID: @.***>

annefou commented 1 month ago

@sebastian-luna-valero what is the status? Should we keep pangeo@EOSC as is for next week (Geo open hack)?

sebastian-luna-valero commented 1 month ago

Hi @annefou

We just tried again the test redeployment: https://repro-challenge.vm.fedcloud.eu/

Please give it a try and let us know how it goes

annefou commented 1 month ago

Hi @annefou

We just tried again the test redeployment: https://repro-challenge.vm.fedcloud.eu/

Please give it a try and let us know how it goes

I tried and I have the same issue e.g. when I clone https://github.com/pangeo-data/geo-open-hack-2024.git and then try to add cells in a notebook then I have some weird behaviour. Execution of a notebook without changes seems to work well. As the training is next week and we are running out of time, I suggest we keep pangeo@EOSC as is. What do you think?

tinaok commented 1 month ago

we still have another training in July , what about we keep it for the one in June, and if no more issue, we update the version in July?

sebastian-luna-valero commented 1 month ago

I suggest we keep pangeo@EOSC as is. What do you think?

Sure, fine by me.

I tried and I have the same issue

The next time that this issue happens again, please let us know immediately and leave your notebook open so we can have a look at the running session to check whether we can see something on the backend explaining this behaviour

Another idea, could you retry the same steps from a different computer? Also, are you sure you have enough RAM available on your current computer?

annefou commented 1 month ago

I suggest we keep pangeo@EOSC as is. What do you think?

Sure, fine by me.

I tried and I have the same issue

The next time that this issue happens again, please let us know immediately and leave your notebook open so we can have a look at the running session to check whether we can see something on the backend explaining this behaviour

Another idea, could you retry the same steps from a different computer? Also, are you sure you have enough RAM available on your current computer?

I suggest we keep pangeo@EOSC as is. What do you think?

Sure, fine by me.

I tried and I have the same issue

The next time that this issue happens again, please let us know immediately and leave your notebook open so we can have a look at the running session to check whether we can see something on the backend explaining this behaviour

Another idea, could you retry the same steps from a different computer? Also, are you sure you have enough RAM available on your current computer?

Ok. this morning I deleted the repo I cloned previously and used google chrome instead of safari. And it seems to work e.g. I don't have this weird behaviour!

sebastian-luna-valero commented 1 month ago

Ok. this morning I deleted the repo I cloned previously and used google chrome instead of safari. And it seems to work e.g. I don't have this weird behaviour!

Good to know, thanks for checking.

Should we continue with the plan? Stay with the current deployment at https://pangeo-eosc.vm.fedcloud.eu/ for https://github.com/pangeo-data/geo-open-hack-2024 and afterwards, we upgrade the deployment and keep testing for https://www.2024.ieeeigarss.org/ ?

Or do you prefer the upgrade before https://github.com/pangeo-data/geo-open-hack-2024 ?

annefou commented 1 month ago

Ok. this morning I deleted the repo I cloned previously and used google chrome instead of safari. And it seems to work e.g. I don't have this weird behaviour!

Good to know, thanks for checking.

Should we continue with the plan? Stay with the current deployment at https://pangeo-eosc.vm.fedcloud.eu/ for https://github.com/pangeo-data/geo-open-hack-2024 and afterwards, we upgrade the deployment and keep testing for https://www.2024.ieeeigarss.org/ ?

Or do you prefer the upgrade before https://github.com/pangeo-data/geo-open-hack-2024 ?

Let's plan for upgrading it after geo open hack (and before IGARSS). Is it OK with you?

sebastian-luna-valero commented 1 month ago

Sure, no problem.

While you prepare/deliver https://github.com/pangeo-data/geo-open-hack-2024 with https://pangeo-eosc.vm.fedcloud.eu/ I would like to play around with https://repro-challenge.vm.fedcloud.eu/ so please don't use it until I request feedback, ok?

sebastian-luna-valero commented 2 weeks ago

Hi,

Now that open-geo-hack is over, and in preparation for IGARSS, could you please test:

Thanks!