microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
185 stars 8 forks source link

Planetary Computer Hub Fails to Launch #117

Closed petebunting closed 2 years ago

petebunting commented 2 years ago

Hi,

When trying to launch the jupyter hub service I get a failure, which looks like the volume with my data isn't mounting for some reason. I don't know if this is working for other people but this was working for me until sometime yesterday, as when I came to log in in the evening this failure started to occur.

This is the event log when it is trying to start up:

Server requested 2022-10-17T14:24:05.810775Z [Normal] Successfully assigned prod/jupyter-pfb-40aber-2eac-2euk to aks-user-37927680-vmss0000j7 2022-10-17T14:24:18Z [Normal] AttachVolume.Attach succeeded for volume "pvc-73654c85-e481-4874-9b36-19016c164a23" 2022-10-17T14:26:08Z [Warning] Unable to attach or mount volumes: unmounted volumes=[volume-pfb-40aber-2eac-2euk], unattached volumes=[volume-pfb-40aber-2eac-2euk user-etc-singleuser dshm]: timed out waiting for the condition 2022-10-17T12:34:04Z [Warning] Unable to attach or mount volumes: unmounted volumes=[volume-pfb-40aber-2eac-2euk], unattached volumes=[dshm volume-pfb-40aber-2eac-2euk user-etc-singleuser]: timed out waiting for the condition 2022-10-17T12:35:38Z [Normal] Container image "jupyterhub/k8s-network-tools:1.2.0" already present on machine 2022-10-17T12:35:38Z [Normal] Created container block-cloud-metadata 2022-10-17T12:35:38Z [Normal] Started container block-cloud-metadata 2022-10-17T12:36:00Z [Warning] Back-off restarting failed container 2022-10-17T14:24:05.810775Z [Normal] Successfully assigned prod/jupyter-pfb-40aber-2eac-2euk to aks-user-37927680-vmss0000j7 2022-10-17T14:24:18Z [Normal] AttachVolume.Attach succeeded for volume "pvc-73654c85-e481-4874-9b36-19016c164a23" 2022-10-17T14:26:08Z [Warning] Unable to attach or mount volumes: unmounted volumes=[volume-pfb-40aber-2eac-2euk], unattached volumes=[volume-pfb-40aber-2eac-2euk user-etc-singleuser dshm]: timed out waiting for the condition 2022-10-17T14:28:18Z [Warning] Back-off restarting failed container Spawn failed: Server at http://10.244.84.8:8888/compute/user/pfb@aber.ac.uk/ didn't respond in 30 seconds

Any suggestions?

Many thanks,

Pete

TomAugspurger commented 2 years ago

I had a brief look at this, but couldn't come to any firm conclusion about what's going on. It doesn't appear that your disk is full. I'll take a closer look later.

petebunting commented 2 years ago

Thanks, Tom.

I don't think I would have run out of disk space, but I was running a sensitivity analysis for some thresholding which would have generated quite a lot of output files - would there be a limit on the number of files?

There isn't anything in the volume which isn't saved to my local machine other than the processing I was doing over the weekend, so if easier, it can just be deleted and a new one created.

I've been meaning to get an Azure bucket set up for this stuff, so I'll follow up on that with my colleagues.

Many thanks.

TomAugspurger commented 2 years ago

Well, I still don't understand the root cause, but I expanded the size of your disk and now it appears to be mountable. As I mentioned, it didn't appear to be full so I don't think giving it more space was the reason it's mountable now, but I might be misunderstanding something.

Can you try to start another notebook server and see how things go?

petebunting commented 2 years ago

Thanks Tom,

Yes, that seems to have fixed it. I'll refactor the way I was doing the sensitivity analysis I was running as it had generated 970,936 files and was about halfway through, so I do wonder if it is related to that...

Many thanks!