gramineproject / graphene

Graphene / Graphene-SGX - a library OS for Linux multi-process applications, with Intel SGX support
https://grapheneproject.io
GNU Lesser General Public License v3.0
769 stars 260 forks source link

With Go program, inside a docker container, bind fails with permission denied error, invalid handle error. #2676

Closed sudharkrish closed 3 years ago

sudharkrish commented 3 years ago

Description of the problem

With Go program, inside a docker container, bind fails permission denied error

Steps to reproduce

Able to reproduce on a recent graphene pull(Aug 30th, 2021), commit-id-> c321726229eaf0a1b52dc5e2507c9cfab423ea94 Also able to reproduce on https://github.com/oscarlab/graphene/releases/tag/v1.2-rc1

Providing Sample Go program and scripts to reproduce the issue.

In graphene repo, under your /home->/graphene/Examples directory, copy this zip file->( go_sample.zip) , and then unzip it, to create go_sample directory under /graphene/Examples/go_sample.

Under /graphene/Examples/go_sample$ Note: You may need to use sudo for the below script for your docker build.

  1. Run the script -> ./launch_main_in_graphene_container.sh This script will build the sample Go program(in a docker container), and then it uses Dockerfile_graphene to enable docker container, with Graphene, and does sgx-build for Go program inside the docker container, and launches the container(main_gsgx) with a shell.
  2. Get a terminal to that container named as -> main_gsgx Examples/go_sample$ docker exec -it main_gsgx /bin/bash root@e879ba553088:/graphene/Examples/go_sample#
  3. Inside the container's terminal launch the Go program using graphene: Examples/go_sample# ./run_main.sh

Expected results

Output below, when running the same Go program, outside of Graphene. Examples/go_sample$ ./main SK_DBG: listening on 172.17.0.1:8805 client: wrote: hello server: read: hello

Actual results

When running in a container using graphene:

[P1:T1:main] debug: Allocating stack at 0x0 (size = 8388608) [P1:T1:main] debug: loading "file:./main" [P1:T1:main] debug: append_r_debug: adding file:./main at 0x0 [P1:T1:main] debug: Creating pipe: pipe.srv:1 debug: sock_getopt (fd = 11, sockopt addr = 0x7ffef14e4360) is not implemented and always returns 0 [P1:T1:main] debug: Shim process initialized [P1:shim] debug: IPC worker started [P1:T1:main] debug: Created sigframe for sig: 23 at 0xb0009390 (handler: 0x460be0, restorer: 0x460d20) [P1:T1:main] error: bind: invalid handle returned SK_DBG: ListenUDP->error listen udp 172.17.0.1:8805: bind: permission denied 2021/09/04 01:35:40 listen udp 172.17.0.1:8805: bind: permission denied

Additional information

Go sample code under gopro2 folder in zip file attached. I debugged this Go program using Graphene's GDB, when running it inside the docker container. When Go Program calls net.ListenUDP, this api invokes 2 syscalls, 1. to create socket, 2. bind In this case, socket creation happens fine, but post-socket creation, that socket-handle that is passed to bind, is determined as invalid, in shim_do_bind in LibOS code. This is shown by the error thrown by LibOS code. [P1:T1:main] error: bind: invalid handle returned But call to bind still goes thro, and seeing bind: permission denied error.

dimakuv commented 3 years ago

May be fixed by this PR: https://github.com/oscarlab/graphene/pull/2678

sudharkrish commented 3 years ago

@dimakuv , this PR #2678 that you have mentioned, has change only in Graphene-direct, not in Graphene-SGX. I am testing in Graphene-SGX, also I am using a non-zero port-number. In any case, I tested with a TCP Go server program, and it still fails to bind, when launched inside a docker container using Graphene-SGX. Here is a zip file with the program and how to test-> (gopro_tcp_testing_container.zip)

And here is the log: [P1:T1:main] debug: loading "file:./main" [P1:T1:main] debug: append_r_debug: adding file:./main at 0x0 [P1:T1:main] debug: Creating pipe: pipe.srv:1 debug: sock_getopt (fd = 11, sockopt addr = 0x7ffc7c4c4740) is not implemented and always returns 0 [P1:T1:main] debug: Shim process initialized [P1:shim] debug: IPC worker started [P1:T1:main] debug: Created sigframe for sig: 23 at 0x90009390 (handler: 0x460b80, restorer: 0x460cc0) [P1:T1:main] error: bind: invalid handle returned Error listening: listen tcp 172.17.0.1:8805: bind: permission denied [P1:T1:main] debug: ---- shim_exit_group (returning 1)

dimakuv commented 3 years ago

Can you run it with loader.log_level = "all" and attach the resulting log?

sudharkrish commented 3 years ago

@dimakuv attached log-> ( udp_bind_perm_denied_within_container_graphene_log.zip)

dimakuv commented 3 years ago

The relevant part of this log is:

[P1:T1:main] trace: ---- shim_socket(INET, SOCK_NONBLOCK|SOCK_CLOEXEC|DGRAM, 0) = 0x3
[P1:T1:main] trace: ---- shim_setsockopt(3, 1, 6, 0xa41047e4, 4) = 0x0
[P1:T1:main] error: bind: invalid handle returned
[P1:T1:main] trace: ---- shim_bind(3, 0xa404002c, 16) = -13

To be honest, this doesn't help much. Apparently, there is some issue with the address parameter used by bind() of the UDP server. Could you maybe debug it with GDB?

Given that we didn't debug UDP properly, and the UDP code in PAL is very old, it's no surprise it is so buggy... We need to refactor it completely.

sudharkrish commented 3 years ago

@dimakuv did some debugging with GDB, and turns out that this a configuration issue, not a graphene issue. Given that this Go application was run using Graphene, inside a docker container, bind failed due to this error-> EADDRNOTAVAIL(99) Cannot assign requested address But Pal's-> unix_to_pal_error_positive, does NOT check for this->EADDRNOTAVAIL, and instead returns a default value of PAL_ERROR_DENIED And then later in LibOS, in shim_do_bind, it calls-> pal_to_unix_errno(PAL_ERROR_DENIED), gets converted to EACCES(Permission Denied).

If possible, we can try to add a change in Graphene, to ensure that application gets the REAL error for this case-> EADDRNOTAVAIL(99) Cannot assign requested address.

But otherwise, this PR can be closed.

dimakuv commented 3 years ago

Thanks for debugging this. Looking at https://github.com/gramineproject/gramine/blob/40e942db6b02555cf2414f20bbd313b63f50e400/Pal/src/host/Linux/pal_linux_error.h and https://github.com/gramineproject/gramine/blob/40e942db6b02555cf2414f20bbd313b63f50e400/common/include/pal_error.h, I don't see any suitable PAL error code to correspond to EADDRNOTAVAIL. In other words, we'll have to add a new PAL error code to have the 1:1 conversion, and we try to not increase the number of error codes in Gramine.

But noted. If we'll hit something similar again, we'll strongly consider adding this error code. Closing the issue.