samuelkarp / runj

runj is an experimental, proof-of-concept OCI-compatible runtime for FreeBSD jails.
Other
624 stars 33 forks source link

Docker run lines don't report completion #21

Open davidchisnall opened 3 years ago

davidchisnall commented 3 years ago

I'm not sure if this is a problem with runj, the containerd shim, or with the moby port (from https://github.com/gizahNL/moby), but If I write this minimal Dockerfile:

FROM kwiat/freebsd:13.0-RELEASE
RUN echo Hello

Running docker build prints hello but doesn't detect that the command has completed:

$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM kwiat/freebsd:13.0-RELEASE
 ---> e122726568cf
Step 2/2 : RUN echo Hello
 ---> Running in 1e48d64d35ef
Hello

It hangs at this point and does not advance. docker --debug produces this output:

DEBU[2021-11-04T13:00:28.328627186Z] Calling POST /v1.39/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&shmsize=0&target=&ulimits=null&version=1
DEBU[2021-11-04T13:00:28.413436000Z] [BUILDER] Cache miss: [/bin/sh -c echo Hello]
DEBU[2021-11-04T13:00:28.413603500Z] [BUILDER] Command to be executed: [/bin/sh -c echo Hello]
DEBU[2021-11-04T13:00:28.414392803Z] [zfs] ID:274bd9ef-1340-47bb-adc7-066976dcd929 START zfs snapshot zroot/docker/ad672c56da999d925214772f741033aa664bd2c8cec20e94ba6bae3eb38c2758@414348503  storage-driver=zfs
DEBU[2021-11-04T13:00:28.502713430Z] [zfs] ID:274bd9ef-1340-47bb-adc7-066976dcd929 FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:28.502994831Z] [zfs] ID:a8bdc8fd-9d96-402d-b42c-94c8c66976e4 START zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset zroot/docker/ad672c56da999d925214772f741033aa664bd2c8cec20e94ba6bae3eb38c2758@414348503  storage-driver=zfs
DEBU[2021-11-04T13:00:28.512969568Z] [zfs] ID:a8bdc8fd-9d96-402d-b42c-94c8c66976e4 FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:28.513273669Z] [zfs] ID:12ff4766-914b-4c3c-8375-50e6919e7c0a START zfs clone -p -o mountpoint=legacy zroot/docker/ad672c56da999d925214772f741033aa664bd2c8cec20e94ba6bae3eb38c2758@414348503 zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init  storage-driver=zfs
DEBU[2021-11-04T13:00:28.664143327Z] [zfs] ID:12ff4766-914b-4c3c-8375-50e6919e7c0a FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:28.664453728Z] [zfs] ID:9e08b49b-6521-4236-aaf3-b53a5d6b3544 START zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init  storage-driver=zfs
DEBU[2021-11-04T13:00:28.675227768Z] [zfs] ID:9e08b49b-6521-4236-aaf3-b53a5d6b3544 FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:28.675527769Z] [zfs] ID:3f560d50-ea3a-4b8c-bcb6-8be6822258e1 START zfs destroy -d zroot/docker/ad672c56da999d925214772f741033aa664bd2c8cec20e94ba6bae3eb38c2758@414348503  storage-driver=zfs
DEBU[2021-11-04T13:00:28.750228346Z] [zfs] ID:3f560d50-ea3a-4b8c-bcb6-8be6822258e1 FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:28.750539147Z] mount("zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init", "/var/lib/docker/zfs/graph/39062f794ae0-init", "")  storage-driver=zfs
DEBU[2021-11-04T13:00:28.769797418Z] unmount("/var/lib/docker/zfs/graph/39062f794ae0-init")  storage-driver=zfs
DEBU[2021-11-04T13:00:28.956046407Z] [zfs] ID:879c1ec5-3724-4f1d-aa87-3628df008abd START zfs snapshot zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init@955941707  storage-driver=zfs
DEBU[2021-11-04T13:00:29.055206674Z] [zfs] ID:879c1ec5-3724-4f1d-aa87-3628df008abd FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:29.055468475Z] [zfs] ID:ad1128ea-00b8-4646-82b4-0f18bf6558c6 START zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init@955941707  storage-driver=zfs
DEBU[2021-11-04T13:00:29.065764313Z] [zfs] ID:ad1128ea-00b8-4646-82b4-0f18bf6558c6 FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:29.066018914Z] [zfs] ID:f1c7c9b4-3231-405c-a131-e7cfc3208efa START zfs clone -p -o mountpoint=legacy zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init@955941707 zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920  storage-driver=zfs
DEBU[2021-11-04T13:00:29.218950880Z] [zfs] ID:f1c7c9b4-3231-405c-a131-e7cfc3208efa FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:29.219252881Z] [zfs] ID:de01cf79-fc92-4818-b7c7-00ab6b024cbc START zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920  storage-driver=zfs
DEBU[2021-11-04T13:00:29.229770120Z] [zfs] ID:de01cf79-fc92-4818-b7c7-00ab6b024cbc FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:29.230035021Z] [zfs] ID:611e4d62-1fa9-46b0-93df-7ba2581a0cea START zfs destroy -d zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920-init@955941707  storage-driver=zfs
DEBU[2021-11-04T13:00:29.297591771Z] [zfs] ID:611e4d62-1fa9-46b0-93df-7ba2581a0cea FINISH  storage-driver=zfs
DEBU[2021-11-04T13:00:29.332823401Z] mount("zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920", "/var/lib/docker/zfs/graph/39062f794ae0", "")  storage-driver=zfs
DEBU[2021-11-04T13:00:29.335554711Z] container mounted via layerStore: &{/var/lib/docker/zfs/graph/39062f794ae0 0x2f07380 0x2f07380}  container=ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8
DEBU[2021-11-04T13:00:29.335736912Z] unmount("/var/lib/docker/zfs/graph/39062f794ae0")  storage-driver=zfs
DEBU[2021-11-04T13:00:29.578775211Z] attach: stderr: begin
DEBU[2021-11-04T13:00:29.578975512Z] attach: stdout: begin
DEBU[2021-11-04T13:00:29.579322813Z] mount("zroot/docker/39062f794ae04547c1705126d5954e7b9577ec6101b2ff74360cbf195ee96920", "/var/lib/docker/zfs/graph/39062f794ae0", "")  storage-driver=zfs
DEBU[2021-11-04T13:00:29.582091023Z] container mounted via layerStore: &{/var/lib/docker/zfs/graph/39062f794ae0 0x2f07380 0x2f07380}  container=ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8
DEBU[2021-11-04T13:00:29.583157427Z] Assigning addresses for endpoint great_meninsky's interface on network bridge
DEBU[2021-11-04T13:00:29.583373328Z] RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[])
DEBU[2021-11-04T13:00:29.583444928Z] Request address PoolID:172.18.0.0/16 App: ipam/default/data, ID: LocalDefault/172.18.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65533, Sequence: (0xc0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:2 Serial:false PrefAddress:<nil>
DEBU[2021-11-04T13:00:29.700087760Z] Assigning addresses for endpoint great_meninsky's interface on network bridge
DEBU[2021-11-04T13:00:30.332606300Z] Programming external connectivity on endpoint great_meninsky (20bae97f80d9218dcf7075ed0e24f87d2ff60d284648df45bffe9543ac94ef51)
DEBU[2021-11-04T13:00:30.332813901Z] EnableService ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 START
DEBU[2021-11-04T13:00:30.332835701Z] EnableService ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 DONE
DEBU[2021-11-04T13:00:30.347904457Z] bundle dir created                            bundle=/var/run/docker/containerd/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 module=libcontainerd namespace=moby root=/var/lib/docker/zfs/graph/39062f794ae0
DEBU[2021-11-04T13:00:30.536242553Z] event                                         module=libcontainerd namespace=moby topic=/tasks/create
DEBU[2021-11-04T13:00:30.572995190Z] event                                         module=libcontainerd namespace=moby topic=/tasks/start
DEBU[2021-11-04T13:00:30.600271190Z] event                                         module=libcontainerd namespace=moby topic=/tasks/exit

At the same time, containerd -l debug produces this:

DEBU[2021-11-04T13:00:30.417187913Z] event published                               ns=moby topic=/containers/create type=containerd.events.ContainerCreate
time="2021-11-04T13:00:30.469033805Z" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 pid=88053
time="2021-11-04T13:00:30.482128553Z" level=warning msg=CREATE req="&CreateTaskRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Rootfs:[]*Mount{},Terminal:false,Stdin:,Stdout:/var/run/docker/containerd/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8/init-stdout,Stderr:/var/run/docker/containerd/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8/init-stderr,Checkpoint:,ParentCheckpoint:,Options:&types1.Any{TypeUrl:containerd.linux.runc.CreateOptions,Value:[],XXX_unrecognized:[],},XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.485106464Z" level=warning msg="Starting runj create" args="[create ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8]" id=ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.514434273Z" level=warning msg="PROCESSING EXIT!" pid=88054 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.533023542Z" level=warning msg="PROCESSING EXIT!" pid=88059 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.533109242Z" level=warning msg="entrypoint waiting!" pid=88058 runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 created 88058 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
DEBU[2021-11-04T13:00:30.534959549Z] event forwarded                               ns=moby topic=/tasks/create type=containerd.events.TaskCreate
time="2021-11-04T13:00:30.536146553Z" level=warning msg=START req="&StartRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.550123105Z" level=warning msg="PROCESSING EXIT!" pid=88060 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.550349906Z" level=warning msg=START runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 created 88058 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.570951582Z" level=warning msg="START runj" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 created 88058 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.571051982Z" level=warning msg="PROCESSING EXIT!" pid=88061 runtime=wtf.sbk.runj.v1 status=0
DEBU[2021-11-04T13:00:30.571806085Z] event forwarded                               ns=moby topic=/tasks/start type=containerd.events.TaskStart
time="2021-11-04T13:00:30.572645788Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.575246698Z" level=warning msg="PROCESSING EXIT!" pid=88058 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.575310098Z" level=warning msg="INIT EXITED!" pid=88058 runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.598661785Z" level=warning msg="PROCESSING EXIT!" pid=88063 runtime=wtf.sbk.runj.v1 status=1
DEBU[2021-11-04T13:00:30.599392287Z] event forwarded                               ns=moby topic=/tasks/exit type=containerd.events.TaskExit
time="2021-11-04T13:00:30.603399402Z" level=warning msg="PROCESSING EXIT!" pid=88062 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.603500002Z" level=warning msg=STATE resp="&StateResponse{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Pid:0,Status:STOPPED,Stdin:,Stdout:,Stderr:,Terminal:false,ExitStatus:0,ExitedAt:0001-01-01 00:00:00 +0000 UTC,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 stopped 0 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.628730396Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.643039049Z" level=warning msg="PROCESSING EXIT!" pid=88068 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.643189149Z" level=warning msg=STATE resp="&StateResponse{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Pid:0,Status:STOPPED,Stdin:,Stdout:,Stderr:,Terminal:false,ExitStatus:0,ExitedAt:0001-01-01 00:00:00 +0000 UTC,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 stopped 0 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.645278257Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.659217708Z" level=warning msg="PROCESSING EXIT!" pid=88069 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.659272909Z" level=warning msg=STATE resp="&StateResponse{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Pid:0,Status:STOPPED,Stdin:,Stdout:,Stderr:,Terminal:false,ExitStatus:0,ExitedAt:0001-01-01 00:00:00 +0000 UTC,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 stopped 0 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.667798840Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.682164593Z" level=warning msg="PROCESSING EXIT!" pid=88070 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.682265594Z" level=warning msg=STATE resp="&StateResponse{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Pid:0,Status:STOPPED,Stdin:,Stdout:,Stderr:,Terminal:false,ExitStatus:0,ExitedAt:0001-01-01 00:00:00 +0000 UTC,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 stopped 0 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.684506602Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.698438154Z" level=warning msg="PROCESSING EXIT!" pid=88071 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.698646154Z" level=warning msg=STATE resp="&StateResponse{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Bundle:/run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,Pid:0,Status:STOPPED,Stdin:,Stdout:,Stderr:,Terminal:false,ExitStatus:0,ExitedAt:0001-01-01 00:00:00 +0000 UTC,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 stopped 0 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
time="2021-11-04T13:00:30.699841059Z" level=warning msg=DELETE req="&DeleteRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:00:30.713187108Z" level=warning msg="PROCESSING EXIT!" pid=88072 runtime=wtf.sbk.runj.v1 status=1
time="2021-11-04T13:00:30.737121997Z" level=warning msg="PROCESSING EXIT!" pid=88073 runtime=wtf.sbk.runj.v1 status=0

It looks as if it's tracking two processes in the jail. I'm not sure why the second exits first (maybe they exit at about the same time and are reported in different orders?). At the end of this, I would expect runj to report that the command finished.

If I restart dockerd, it notices that it has a container that appears to be running that it doesn't know about and tries to kill it:

DEBU[2021-11-04T13:05:06.936629561Z] shutting down container considered alive by containerd  container=ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 paused=false restarting=false running=true
DEBU[2021-11-04T13:05:06.936674861Z] Sending kill signal 15 to container ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8
INFO[2021-11-04T13:05:16.988973523Z] Container failed to exit within 10 seconds of signal 15 - using the force  container=ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8
DEBU[2021-11-04T13:05:16.989196524Z] Sending kill signal 9 to container ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8

This triggers the following output from containerd:

time="2021-11-04T13:05:06.964064362Z" level=warning msg=KILL req="&KillRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,Signal:15,All:false,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:05:06.977719212Z" level=warning msg="PROCESSING EXIT!" pid=88095 runtime=wtf.sbk.runj.v1 status=1
time="2021-11-04T13:05:17.009418898Z" level=warning msg=STATE req="&StateRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:05:17.023122049Z" level=warning msg="PROCESSING EXIT!" pid=88097 runtime=wtf.sbk.runj.v1 status=1
ERRO[2021-11-04T13:05:17.023730451Z] get state for ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8  error="invalid character 'E' looking for beginning of value: unknown"
WARN[2021-11-04T13:05:17.023882952Z] unknown status                                status=0
time="2021-11-04T13:05:17.025965860Z" level=warning msg=KILL req="&KillRequest{ID:ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8,ExecID:,Signal:9,All:false,XXX_unrecognized:[],}" runtime=wtf.sbk.runj.v1
time="2021-11-04T13:05:17.039224209Z" level=warning msg="PROCESSING EXIT!" pid=88098 runtime=wtf.sbk.runj.v1 status=1

Again, it looks as if runj is generating a log message indicating that the process has exited, but isn't reporting the exit to containerd?

I can kill the jails with jail -r.

@kit-ty-kate, I took me minimal docker file from one that you'd filed in an issue on the moby port, so it's presumably worked for you in the past?

kit-ty-kate commented 3 years ago

@kit-ty-kate, I took me minimal docker file from one that you'd filed in an issue on the moby port, so it's presumably worked for you in the past?

Yes it worked in the past (back in July)

I’m also getting this issue now. One thing that’s to be noted as well is that if you ^C docker build, then restart the job it will finish normally, even with --no-cache.

samuelkarp commented 3 years ago

I haven't had the time to take a look at @gizahNL's moby port at all yet, so I'm not going to be a ton of help in debugging whether there's an issue at that layer or not. I do see the following log line which makes me think that moby might be receiving an exit but not processing it, though this is definitely not conclusive:

DEBU[2021-11-04T13:00:30.600271190Z] event                                         module=libcontainerd namespace=moby topic=/tasks/exit

These lines from the containerd log (really from the shim) seem to indicate that the container exited:

time="2021-11-04T13:00:30.533109242Z" level=warning msg="entrypoint waiting!" pid=88058 runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 created 88058 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
[...]
time="2021-11-04T13:00:30.570951582Z" level=warning msg="START runj" runtime=wtf.sbk.runj.v1 state="&{1.0.2-runj-dev ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 created 88058 /run/containerd/io.containerd.runtime.v2.task/moby/ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 map[]}"
[...]
time="2021-11-04T13:00:30.575246698Z" level=warning msg="PROCESSING EXIT!" pid=88058 runtime=wtf.sbk.runj.v1 status=0
time="2021-11-04T13:00:30.575310098Z" level=warning msg="INIT EXITED!" pid=88058 runtime=wtf.sbk.runj.v1

In these logs you can see pid 88058 was the main process for the container. This is initially the runj-entrypoint program but after the Start API is called (which is show in the logs) it gets signalled (via a fifo) and calls unix.Exec (which wraps execve(2)) with the command specified in the bundle.

I did just make changes to how runj-entrypoint works (see https://github.com/samuelkarp/runj/commit/398df9a0deea637af3062df1cf846a8d5d11bd49), so if it used to work before that change and now doesn't that'd be a pretty good indicator that my commit is buggy.

It looks as if it's tracking two processes in the jail.

The shim sets itself up as a subreaper with PROC_REAP_ACQUIRE so it should be tracking every subprocess that is not otherwise reaped by a parent. (This is because the process hierarchy looks like containerd -> containerd-shim-runj-v1 -> runj (create) -> [main jail process] and runj exits after the runj create is issued.) The only process it actually cares about is the runj-entrypoint/main jail process, but the other output is still there since I find a lot of debugging output helps me when I'm developing.

Anyway, I think it'd help to try the following things:

  1. Revert the commit I linked above and see if that fixes it. If so, we'll need to figure out what I screwed up (and the other steps will still be helpful).
  2. Explore the ctr commands to see containerd's view of the container during this. ctr -n moby container ls, ctr -n moby container info ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 (pulling that from the log above), ctr -n moby task ls, etc
  3. See what runj says: sudo runj state ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8
davidchisnall commented 3 years ago

Revert the commit I linked above and see if that fixes it. If so, we'll need to figure out what I screwed up (and the other steps will still be helpful).

Yup, that fixes it! I rolled back to a37cfff and docker build worked.

Explore the ctr commands to see containerd's view of the container during this. ctr -n moby container ls, ctr -n moby container info ac1d3703e32454d690b89bef70daafcb2d78d78e208dbf99091b91f0cf8f43b8 (pulling that from the log above), ctr -n moby task ls, etc

It looks as if the stopped ones can't be cleaned up by containerd and I couldn't find a graceful way of cleaning this up, so I uninstalled containerd and docker-engine, deleted everything that looked relevant (/var/lib/containerd and the zroot/docker filesystem) and reinstalled and restarted. So doing everything from scratch with the broken version:

$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM kwiat/freebsd:13.0-RELEASE
13.0-RELEASE: Pulling from kwiat/freebsd
6a487bee48e0: Pull complete
Digest: sha256:23d315d0f15df632d57c8cfec9dddde4451d695f585063a70a7a10fee3c10ebf
Status: Downloaded newer image for kwiat/freebsd:13.0-RELEASE
 ---> e122726568cf
Step 2/2 : RUN echo hello
 ---> Running in 29c6bc0cf571
hello

This hangs, as previously. From another terminal:

# ctr -n moby container ls
CONTAINER                                                           IMAGE    RUNTIME
29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a    -        wtf.sbk.runj.v1
# ctr -n moby container info 29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a
{
    "ID": "29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a",
    "Labels": {
        "com.docker/engine.bundle.path": "/var/run/docker/containerd/29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a"
    },
    "Image": "",
    "Runtime": {
        "Name": "wtf.sbk.runj.v1",
        "Options": null
    },
    "SnapshotKey": "",
    "Snapshotter": "",
    "CreatedAt": "2021-11-05T11:28:00.547108221Z",
    "UpdatedAt": "2021-11-05T11:28:00.547108221Z",
    "Extensions": null,
    "Spec": {
        "ociVersion": "1.0.2-dev",
        "process": {
            "user": {
                "uid": 0,
                "gid": 0
            },
            "args": [
                "/bin/sh",
                "-c",
                "echo hello"
            ],
            "env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "HOSTNAME=29c6bc0cf571"
            ],
            "cwd": "/",
            "capabilities": {
                "bounding": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ],
                "effective": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ],
                "inheritable": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ],
                "permitted": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ]
            },
            "oomScoreAdj": 0
        },
        "root": {
            "path": "/var/lib/docker/zfs/graph/9b78e7b43c91"
        },
        "hostname": "29c6bc0cf571",
        "mounts": [
            {
                "destination": "/proc",
                "type": "procfs",
                "source": "proc",
                "options": [
                    "nosuid",
                    "noexec"
                ]
            },
            {
                "destination": "/dev",
                "type": "devfs",
                "source": "devfs"
            }
        ],
        "hooks": {
            "createRuntime": [
                {
                    "path": "/bin/cp",
                    "args": [
                        "/var/lib/docker/containers/29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a/resolv.conf",
                        "/var/lib/docker/zfs/graph/9b78e7b43c91/etc/resolv.conf"
                    ]
                },
                {
                    "path": "/bin/cp",
                    "args": [
                        "/var/lib/docker/containers/29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a/hostname",
                        "/var/lib/docker/zfs/graph/9b78e7b43c91/etc/hostname"
                    ]
                },
                {
                    "path": "/bin/cp",
                    "args": [
                        "/var/lib/docker/containers/29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a/hosts",
                        "/var/lib/docker/zfs/graph/9b78e7b43c91/etc/hosts"
                    ]
                }
            ]
        },
        "linux": {
            "resources": {
                "devices": [
                    {
                        "allow": false,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 5,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 3,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 9,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 8,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 0,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 1,
                        "access": "rwm"
                    },
                    {
                        "allow": false,
                        "type": "c",
                        "major": 10,
                        "minor": 229,
                        "access": "rwm"
                    }
                ],
                "memory": {},
                "cpu": {
                    "shares": 0
                },
                "blockIO": {
                    "weight": 0
                }
            },
            "namespaces": [
                {
                    "type": "mount"
                },
                {
                    "type": "network"
                },
                {
                    "type": "uts"
                },
                {
                    "type": "pid"
                },
                {
                    "type": "ipc"
                }
            ],
            "maskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "readonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        }
    }
}

It looks as if the moby build is still assuming linux things and needs to generate a FreeBSD container config on FreeBSD.

# ctr -n moby task ls
TASK                                                                PID    STATUS
29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a    0      UNKNOWN

This is suspicious - not sure why it thinks the pid is 0, pid 0 is the kernel. I presume 0 is a placeholder here for pid-not-known?

# runj state 29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a
Error: open /var/lib/runj/jails/29c6bc0cf571c6587f9670916a08917aae21c58e728a1d45c1124bbb7abc204a/state.json: no such file or directory

The /var/lib/runj/jails/ directory is empty.

# ps aux | grep runj
root       22779   0.0  0.0  267772  2524  1  S+   11:34     0:00.00 grep runj
root       22733   0.0  0.0  983656 12796  5  I    11:28     0:00.16 /usr/local/bin/containerd-shim-runj-v1 -namespace moby -id 29c6bc0cf571c6587f9670916a08917aae21

So it looks as if the runj shim is still running but it has cleaned up all of the associated state?

davidchisnall commented 3 years ago

I'm not 100% sure that the cause is the new code or if it's a race condition that is triggered more of the time in the new code. I had a container hang after trying to run ifconfig in a Dockerfile RUN line to see if I could diagnose the problems with the moby network bits.

Restarting dockerd causes it to loop doing this:

DEBU[2021-11-05T12:02:58.356143623Z] shutting down container considered alive by containerd  container=549f35b8a14f7f5e3a11a8ce9ad22e633e4192c19004c8c879f195fcb98e6c5b paused=false restarting=false running=true
DEBU[2021-11-05T12:02:58.356166123Z] Sending kill signal 15 to container 549f35b8a14f7f5e3a11a8ce9ad22e633e4192c19004c8c879f195fcb98e6c5b
INFO[2021-11-05T12:03:08.402961299Z] Container failed to exit within 10 seconds of signal 15 - using the force  container=7d3653e7ba07bc8694750d7d622f8d3fcc9ff602e9b50b3e57dee62f05d3b80d
DEBU[2021-11-05T12:03:08.403085800Z] Sending kill signal 9 to container 7d3653e7ba07bc8694750d7d622f8d3fcc9ff602e9b50b3e57dee62f05d3b80d
INFO[2021-11-05T12:03:08.403193100Z] Container failed to exit within 10 seconds of signal 15 - using the force  container=549f35b8a14f7f5e3a11a8ce9ad22e633e4192c19004c8c879f195fcb98e6c5b
DEBU[2021-11-05T12:03:08.403249901Z] Sending kill signal 9 to container 549f35b8a14f7f5e3a11a8ce9ad22e633e4192c19004c8c879f195fcb98e6c5b

dockerd is right that containerd thinks that the container is running, and it does have a shim instance for each version:

 # ctr -n moby containers ls
CONTAINER                                                           IMAGE    RUNTIME
549f35b8a14f7f5e3a11a8ce9ad22e633e4192c19004c8c879f195fcb98e6c5b    -        wtf.sbk.runj.v1
7d3653e7ba07bc8694750d7d622f8d3fcc9ff602e9b50b3e57dee62f05d3b80d    -        wtf.sbk.runj.v1
# ps aux | grep runj
root       52084   0.0  0.0  266748  2436  1  R+   12:05     0:00.00 grep runj
root       50483   0.0  0.0  983400 13000  5  I    11:56     0:00.23 /usr/local/bin/containerd-shim-runj-v1 -namespace moby -id 7d3653e7ba07bc8694750d7d622f8d3fcc9f
root       51150   0.0  0.0  984808 12560  5  I    11:59     0:00.22 /usr/local/bin/containerd-shim-runj-v1 -namespace moby -id 549f35b8a14f7f5e3a11a8ce9ad22e633e41

But in this situation the same thing has happened: /var/lib/runj/jails is empty.

gizahNL commented 3 years ago

Yeah moby writes out a default config containing many linuxisms, they're all ignored by runj and containerd though.

dfr commented 2 years ago

This isn't a problem with runj itself. In my work-in-progress port of buildah, I tried this example build:

FROM kwiat/freebsd:13.0-RELEASE
RUN echo Hello
RUN exit 42

and got the expected results:

STEP 1/3: FROM kwiat/freebsd:13.0-RELEASE
STEP 2/3: RUN echo Hello
Hello
STEP 3/3: RUN exit 42
error building at STEP "RUN exit 42": error while running runtime: exit status 42