nuclio / nuclio

High-Performance Serverless event and data processing platform
https://nuclio.io
Apache License 2.0
5.32k stars 536 forks source link

[Bug]: nuctl deploy hangs after buiding the image successfully #3405

Open ywangwxd opened 19 hours ago

ywangwxd commented 19 hours ago

Nuclio Version checks

Issue Description

I encountered this issue while attempting to deploy the Facebook SAM function from docker image for the CVAT project. It hangs after buliding the image. To be specific, it hangs when inspect the container.

Expected Behavior

I would expect the cleanup stage to go well and if it didn't, I could at least disable it.

Deployment Method

Docker

Nuclio Version

1.13

Additional Information

Here is the last piece of logger output when using the --verbose option in the nuctl deploy command. It is strange that initially the container is found, but when I attempt to inspect it, it hangs. I checked the Linux console and found that the container did not exist at all.

Another weird thing is that the container ID 96cde29537ef remains the same for different runs. Shouldn't a container be freshly started for each run?

24.11.19 16:32:06.469 (D) nuctl.platform.docker Successfully built image {"image": "cvat.pth.facebookresearch.sam.vit_h:latest"} 24.11.19 16:32:06.469 (I) nuctl.platform Pushing docker image into registry {"image": "cvat.pth.facebookresearch.sam.vit_h:latest", "registry": ""} 24.11.19 16:32:06.469 (I) nuctl.platform Docker image was successfully built and pushed into docker registry {"image": "cvat.pth.facebookresearch.sam.vit_h:latest"} 24.11.19 16:32:06.469 (I) nuctl Build complete {"image": "cvat.pth.facebookresearch.sam.vit_h:latest"} 24.11.19 16:32:06.469 (D) nuctl Build complete {"result": {"Image":"cvat.pth.facebookresearch.sam.vit_h:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-facebookresearch-sam-vit-h","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"animated_gif":"https://raw.githubusercontent.com/cvat-ai/cvat/develop/site/content/en/images/hrnet_example.gif","help_message":"The interactor allows to get a mask of an object using at least one positive, and any negative points inside it","min_neg_points":"0","min_pos_points":"0","name":"Segment Anything","spec":"","startswith_box_optional":"true","type":"interactor","version":"2"}},"spec":{"description":"Facebook SAM segmentation.","handler":"main:handler","runtime":"python:3.8","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/sam"}],"resources":{"requests":{"cpu":"25m","memory":"1Mi"}},"image":"cvat.pth.facebookresearch.sam.vit_h:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","numWorkers":1,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432},"maxWorkers":1}},"build":{"image":"cvat.pth.facebookresearch.sam.vit_h","noCleanup":true,"baseImage":"ubuntu:22.04","directives":{"preCopy":[{"kind":"ENV","value":"DEBIAN_FRONTEND=noninteractive"},{"kind":"WORKDIR","value":"/opt/nuclio/sam"},{"kind":"RUN","value":"apt-get update && apt-get -y install curl git python3 python3-pip ffmpeg libsm6 libxext6"},{"kind":"RUN","value":"pip3 install torch torchvision torchaudio pycocotools matplotlib onnxruntime onnx"},{"kind":"RUN","value":"pip3 install git+https://github.com/facebookresearch/segment-anything.git"},{"kind":"RUN","value":"curl -O https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth"},{"kind":"RUN","value":"ln -s /usr/bin/pip3 /usr/local/bin/pip && ln -s /usr/bin/python3 /usr/bin/python"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":120,"securityContext":{},"disableDefaultHTTPTrigger":false,"eventTimeout":"30s"}}}} 24.11.19 16:32:06.469 (D) nuctl no-cleanup flag provided, skipping temporary dir cleanup 24.11.19 16:32:06.469 (I) nuctl Cleaning up before deployment {"functionName": "pth-facebookresearch-sam-vit-h"} 24.11.19 16:32:06.469 (D) nuctl.platform.docker Getting containers {"options": {"Name":"nuclio-nuclio-pth-facebookresearch-sam-vit-h","Labels":null,"Stopped":true,"ID":""}} 24.11.19 16:32:06.469 (D) tl.platform.docker.runner Executing {"command": "docker ps --quiet --all --filter \"name=^/nuclio-nuclio-pth-facebookresearch-sam-vit-h$\" "} 24.11.19 16:32:06.489 (D) tl.platform.docker.runner Command executed successfully {"output": "96cde29537ef\n", "stderr": "", "exitCode": 0} 24.11.19 16:32:06.489 (D) tl.platform.docker.runner Executing {"command": "docker inspect 96cde29537ef "}

TomerShor commented 9 hours ago

Hey @ywangwxd, The fact that the container id is the same leads me to believe that you might still have a stuck pod that is hidden when you run docker ps but will be shown with the --all flag. Try running the full command yourself and check if you can see the pod:

docker ps --quiet --all --filter "name=^/nuclio-nuclio-pth-facebookresearch-sam-vit-h$