Open scottyhq opened 1 year ago
For what it's worth after the failed start, hitting relaunch worked with the usual log messages:
Event log
Server requested
2023-01-26T16:51:29.874186Z [Normal] Successfully assigned prod/jupyter-scottyh-40uw-2eedu to aks-user-37927680-vmss00019a
2023-01-26T16:51:47Z [Normal] AttachVolume.Attach succeeded for volume "pvc-7c839cb1-304b-4c05-8a08-6da914f50791"
2023-01-26T16:51:57Z [Normal] Container image "jupyterhub/k8s-network-tools:1.2.0" already present on machine
2023-01-26T16:51:57Z [Normal] Created container block-cloud-metadata
2023-01-26T16:51:58Z [Normal] Started container block-cloud-metadata
2023-01-26T16:51:58Z [Normal] Container image "pcccr.azurecr.io/public/planetary-computer/python:2022.9.16.0" already present on machine
2023-01-26T16:51:58Z [Normal] Created container notebook
2023-01-26T16:51:58Z [Normal] Started container notebook
Thanks for the report. I think the first line about the nodes is somewhat expected. Kubernetes will emit that before the autoscaler adds more nodes.
The line at
2023-01-26T16:42:25Z [Warning] AttachVolume.Attach failed for volume "pvc-7c839cb1-304b-4c05-8a08-6da914f50791" : timed out waiting for external-attacher of disk.csi.azure.com CSI driver to attach volume /subscriptions/9da7523a-cb61-4c3e-b1d4-afa5fc6d2da9/resourceGroups/MC_pcc-prod-2-rg_pcc-prod-2-cluster_westeurope/providers/Microsoft.Compute/disks/restore-b66b4a4b-b3a3-4f17-ac08-7e70b5c7c670
is an error we used to see pretty often, but it seemed to be mostly fixed with our migration to a newer Kubernetes Cluster.
As you saw, you saw, the volume attach seems to always succeed on subsequent attempts.
I'll keep an eye out to see if this continues to happen.
@TomAugspurger Looks like my home directory was full, but it seems there is no other way to delete files without logging into the server?
Spawn failed: Server at http://10.244.224.162:8888/compute/user/cxyth@live.com/ didn't respond in 30 seconds
Event log
Server requested
2024-03-28T06:35:27.260261Z [Normal] Successfully assigned prod/jupyter-cxyth-40live-2ecom to aks-user-17077795-vmss0000x1
2024-03-28T06:35:36Z [Normal] AttachVolume.Attach succeeded for volume "pvc-1ebcdaf5-21f2-40c9-bdd8-d96e49e974a5"
2024-03-28T06:35:40Z [Normal] Container image "jupyterhub/k8s-network-tools:1.2.0" already present on machine
2024-03-28T06:35:40Z [Normal] Created container block-cloud-metadata
2024-03-28T06:35:41Z [Normal] Started container block-cloud-metadata
2024-03-28T06:35:41Z [Normal] Container image "pcccr.azurecr.io/planetary-computer/python:2024.3.20.1" already present on machine
2024-03-28T06:35:41Z [Normal] Created container notebook
2024-03-28T06:35:41Z [Normal] Started container notebook
Spawn failed: Server at http://10.244.224.162:8888/compute/user/cxyth@live.com/ didn't respond in 30 seconds
Could you send us an email at @.*** with the address you signed up with and we'll take a look?
From: cxyth @.> Sent: Thursday, March 28, 2024 9:37 PM To: microsoft/PlanetaryComputer @.> Cc: Mention @.>; Comment @.>; Subscribed @.***> Subject: Re: [microsoft/PlanetaryComputer] Unable to attach or mount volumes, Spawn failed: did not start in 900 seconds (Issue #171)
@TomAugspurgerhttps://github.com/TomAugspurger Looks like my home directory was full, but it seems there is no other way to delete files without logging into the server?
Spawn failed: Server at @.***/ didn't respond in 30 seconds
Event log Server requested 2024-03-28T06:35:27.260261Z [Normal] Successfully assigned prod/jupyter-cxyth-40live-2ecom to aks-user-17077795-vmss0000x1 2024-03-28T06:35:36Z [Normal] AttachVolume.Attach succeeded for volume "pvc-1ebcdaf5-21f2-40c9-bdd8-d96e49e974a5" 2024-03-28T06:35:40Z [Normal] Container image "jupyterhub/k8s-network-tools:1.2.0" already present on machine 2024-03-28T06:35:40Z [Normal] Created container block-cloud-metadata 2024-03-28T06:35:41Z [Normal] Started container block-cloud-metadata 2024-03-28T06:35:41Z [Normal] Container image "pcccr.azurecr.io/planetary-computer/python:2024.3.20.1" already present on machine 2024-03-28T06:35:41Z [Normal] Created container notebook 2024-03-28T06:35:41Z [Normal] Started container notebook Spawn failed: Server at @.***/ didn't respond in 30 seconds
— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/171#issuecomment-2026507451 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOISCWFMVUYWR5IPDD2TY2THW3BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRRGU2TQNBVGMYTGNNHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you were mentioned.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
The first log line (0/116 nodes are available) makes me think it's a scaling limit issue. But the status page (https://planetarycomputer-status.microsoft.com/) looks fine...
Seems separate from #117