gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
588 stars 193 forks source link

Getting fatal error: failed to reserve page summary memory while running an application in gramine #232

Closed M-V-Manikandan closed 1 year ago

M-V-Manikandan commented 2 years ago

Description of the problem

Facing issue while trying to run/invoke a security based application( actually a web service which verifies certificates) written in go inside gramine. The application is called from a caller program via execvp command. The application uses systemctl internally to deploy itself as a webservice. However, while executing the application inside gramine on sgx enabled machine, gramine fails to load, throws

fatal error: failed to reserve page summary memory

We tried with commit 1fd075ee309af308d75a474156518f3b9807567d (HEAD -> master, origin/master, origin/HEAD), now seeing ...fatal error: failed to reserve page summary memory...

Any suggestions to resolve/understand this would be helpful. Logs are are below..

debug: Token file: gramine-sqvs.token
debug: Read dummy DCAP token
debug: enclave created:

debug: base: 0x0000000000000000

debug: size: 0x0000000010000000
debug: misc_select: 0x00000000

debug: attr.flags: 0x0000000000000005
debug: attr.xfrm: 0x00000000000000e7

debug: ssa_frame_size: 4
debug: isv_prod_id: 0x00000000
debug: isv_svn: 0x00000000

debug: adding pages to enclave: 0xff79000-0x10000000 [REG:R--] (manifest) measured
debug: adding pages to enclave: 0xff59000-0xff79000 [REG:RW-] (ssa) measured
debug: adding pages to enclave: 0xff55000-0xff59000 [TCS:---] (tcs) measured
debug: adding pages to enclave: 0xff51000-0xff55000 [REG:RW-] (tls) measured
debug: adding pages to enclave: 0xff11000-0xff51000 [REG:RW-] (stack) measured
debug: adding pages to enclave: 0xfed1000-0xff11000 [REG:RW-] (stack) measured
debug: adding pages to enclave: 0xfe91000-0xfed1000 [REG:RW-] (stack) measured
debug: adding pages to enclave: 0xfe51000-0xfe91000 [REG:RW-] (stack) measured
debug: adding pages to enclave: 0xfe41000-0xfe51000 [REG:RW-] (sig_stack) measured
debug: adding pages to enclave: 0xfe31000-0xfe41000 [REG:RW-] (sig_stack) measured
debug: adding pages to enclave: 0xfe21000-0xfe31000 [REG:RW-] (sig_stack) measured
debug: adding pages to enclave: 0xfe11000-0xfe21000 [REG:RW-] (sig_stack) measured
debug: adding pages to enclave: 0xf9e4000-0xfa39000 [REG:R-X] (code) measured
debug: adding pages to enclave: 0xfa39000-0xfa3d000 [REG:RW-] (data) measured
debug: adding pages to enclave: 0xfa3d000-0xfe11000 [REG:RW-] (bss) measured
debug: adding pages to enclave: 0x10000-0xf9e4000 [REG:RWX] (free)
debug: enclave initializing:
debug: enclave id: 0x000000000ffff000
debug: mr_enclave: ece968e3780e658a266b77fb4e77159ab856b36e17897d0cea6f71f115029176

debug: Using SGX DCAP/ECDSA attestation

Gramine detected the following insecure configurations:

- loader.log_level = warning|debug|trace|all (verbose log level, may leak information)
- loader.insecure__use_cmdline_argv = true (forwarding command-line args from untrusted host to the app)

Gramine will continue application execution, but this configuration must not be used in production!

warning: DkVirtualMemoryProtect is unimplemented in Linux-SGX PAL
[::] debug: Host: Linux-SGX

[::] debug: LibOS xsave_enabled 1, xsave_size 0xa80(2688), xsave_features 0xe7
[::] debug: Initial VMA region 0xf51b000-0xf59c000 (LibOS) bookkeeped
[::] debug: Initial VMA region 0xff79000-0x10000000 (manifest) bookkeeped
[::] debug: ASLR top address adjusted to 0x7bbc000
[::] debug: Shim loaded at 0xf51b000, ready to initialize

[::] debug: Mounting root as chroot filesystem: from file:. to /
[::] debug: Mounting special proc filesystem: /proc
[::] debug: Mounting special dev filesystem: /dev
[::] debug: Mounting terminal device /dev/tty under /dev

[P1:T1:] debug: Mounting as chroot filesystem: from file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc to /lib

[P1:T1:] debug: Mounting as chroot filesystem: from file:/etc to /etc
[P1:T1:] debug: Mounting as chroot filesystem: from file:/lib/x86_64-linux-gnu to /lib/x86_64-linux-gnu
[P1:T1:] debug: Mounting as chroot filesystem: from file:/usr/lib/x86_64-linux-gnu to /usr/lib/x86_64-linux-gnu

[P1:T1:] debug: Mounting as chroot filesystem: from file:/root/mohanpa/gramine/CI-Examples/gramine-sqvs/gramine-sqvs to /root/mohanpa/gramine/CI-Examples/gramine-sqvs
[P1:T1:gramine-sqvs] debug: Allocating stack at 0 (size = 262144)

[P1:T1:gramine-sqvs] debug: loading "file:./gramine-sqvs"
[P1:T1:gramine-sqvs] debug: find_interp: searching for interpreter: /lib/ld-linux-x86-64.so.2

[P1:T1:gramine-sqvs] debug: loading "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc/ld-linux-x86-64.so.2"

[P1:T1:gramine-sqvs] debug: Creating pipe: pipe.srv:1

debug: sock_getopt (fd = 12, sockopt addr = 0x7ffc0050b810) is not implemented and always returns 0

[P1:T1:gramine-sqvs] debug: Shim process initialized
[P1:shim] debug: IPC worker started

[P1:T1:gramine-sqvs] debug: glibc register library /lib/libc.so.6 loaded at 0x0794d000
[P1:T1:sqvs] debug: Allocating stack at 0 (size = 262144)

[P1:T1:sqvs] debug: loading "file:./sqvs"

[P1:T1:sqvs] debug: find_interp: searching for interpreter: /lib/ld-linux-x86-64.so.2
[P1:T1:sqvs] debug: loading "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc/ld-linux-x86-64.so.2"
[P1:T1:sqvs] debug: execve: start execution
[P1:T1:sqvs] debug: glibc register library /lib/libpthread.so.0 loaded at 0x07b81000
[P1:T1:sqvs] debug: glibc register library /lib/libc.so.6 loaded at 0x0798e000
fatal error: failed to reserve page summary memory

runtime stack:
runtime.throw(0x85600b, 0x25)
/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0x7949d58 sp=0x7949d28 pc=0x4345d2
runtime.(*pageAlloc).sysInit(0xb97f48)
/usr/local/go/src/runtime/mpagealloc_64bit.go:80 +0x17e fp=0x7949de8 sp=0x7949d58 pc=0x42b4de
runtime.(*pageAlloc).init(0xb97f48, 0xb97f40, 0xbac5f8)
/usr/local/go/src/runtime/mpagealloc.go:297 +0x75 fp=0x7949e10 sp=0x7949de8 pc=0x429135
runtime.(*mheap).init(0xb97f40)
/usr/local/go/src/runtime/mheap.go:694 +0x274 fp=0x7949e38 sp=0x7949e10 pc=0x426354
runtime.mallocinit()
/usr/local/go/src/runtime/malloc.go:470 +0xff fp=0x7949e68 sp=0x7949e38 pc=0x40c8ef
runtime.schedinit()
/usr/local/go/src/runtime/proc.go:545 +0x60 fp=0x7949ec0 sp=0x7949e68 pc=0x437cc0
runtime.rt0_go(0x7949ef8, 0x1, 0x7949ef8, 0x7949e48, 0x7b8a0aa, 0x1, 0x7949ffa, 0x0, 0x7949faf, 0x0, ...)
/usr/local/go/src/runtime/asm_amd64.s:214 +0x125 fp=0x7949ec8 sp=0x7949ec0 pc=0x4620d5
[P1:T1:sqvs] debug: ---- shim_exit_group (returning 2)
[P1:T1:sqvs] debug: clearing POSIX locks for pid 1
[P1:T1:sqvs] debug: sync client shutdown: closing handles
[P1:T1:sqvs] debug: sync client shutdown: waiting for confirmation
[P1:T1:sqvs] debug: sync client shutdown: finished
[P1:shim] debug: IPC worker: exiting worker thread
[P1:T1:sqvs] debug: process 1 exited with status 2
debug: DkProcessExit: Returning exit code 2
-->

MANIFEST:

oader.entrypoint = "file:/usr/local/lib/x86_64-linux-gnu/gramine/libsysdb.so"
libos.entrypoint = "gramine-sqvs"
loader.log_level = "debug"

loader.env.LD_LIBRARY_PATH = "/lib:/usr/lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu:/libs"

loader.insecure__use_cmdline_argv = true

fs.mount.lib.type = "chroot"
fs.mount.lib.path = "/lib"
fs.mount.lib.uri = "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc"

fs.mount.lib2.type = "chroot"
fs.mount.lib2.path = "/lib/x86_64-linux-gnu"
fs.mount.lib2.uri = "file:/lib/x86_64-linux-gnu"

fs.mount.lib3.type = "chroot"
fs.mount.lib3.path = "/usr/lib/x86_64-linux-gnu"
fs.mount.lib3.uri = "file:/usr/lib/x86_64-linux-gnu"

fs.mount.etc.type = "chroot"
fs.mount.etc.path = "/etc"
fs.mount.etc.uri = "file:/etc"

fs.mount.sqvs.type = "chroot"
fs.mount.sqvs.path = "/root/mohanpa/gramine/CI-Examples/gramine-sqvs"
fs.mount.sqvs.uri = "file:/root/mohanpa/gramine/CI-Examples/gramine-sqvs/gramine-sqvs"

sgx.nonpie_binary = true
sgx.remote_attestation = true
sgx.sqvs_spid = ""
sgx.sqvs_linkable = 0

sgx.trusted_files.gramine-sqvs= "file:gramine-sqvs"
sgx.trusted_files.sqvs= "file:sqvs"
sgx.trusted_files.libsysdb = "file:/usr/local/lib/x86_64-linux-gnu/gramine/libsysdb.so"
sgx.trusted_files.runtime = "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc/"
sgx.trusted_files.arch_libdir = "file:/lib/x86_64-linux-gnu/"
sgx.trusted_files.usr_arch_libdir = "file:/usr/lib/x86_64-linux-gnu/"
sgx.trusted_files.libs = "file:./libs/"
#sgx.trusted_files.cachain actually not required -- to be removed
sgx.trusted_files.cachain = "file:certs/test-ca-sha256.crt"

sgx.allowed-files = [
"file:/etc/nsswitch.conf",
"file:/etc/ethers",
"file:/etc/host.conf",
"file:/etc/hosts",
"file:/etc/group",
"file:/etc/passwd",
"file:/etc/gai.conf",
"file:/etc/resolv.conf",

"file:/opt/sqvs/sqvs.service",
"file:/var/log/sqvs/http.log",
"file:/var/log/sqvs/sqvs-security.log",
"file:/var/log/sqvs/sqvs.log",
"file:/root/binaries/env/sqvs.env",
"file:/root/sqvs.env",
"file:/etc/sqvs/tls.key",
"file:/etc/sqvs/tls-vert.pem",
"file:/etc/sqvs/config.yml",
"file:/etc/sqvs/sqvs_signing_pub_key.pem",
"file:/etc/sqvs/sqvs_signing_priv_key.pem",
"file:/etc/sqvs/certs/trustedSGXRootCA.pem",
"file:/etc/sqvs/certs/trustedjwt/",
"file:/etc/sqvs/certs/trustedca/202d1d4f2.pem",
]

Thanks & Regards, Manikandan.V

Steps to reproduce

dimakuv commented 2 years ago

@M-V-Manikandan Could you edit your logs as code snippets (use "Insert Code" button or just use triple backticks)?

dimakuv commented 2 years ago

Please post the Gramine log with loader.log_level = "all". This will print out much more information. Currently I don't see anything helpful in the Gramine log.

boryspoplawski commented 2 years ago

Please paste terminal output and files between triple backticks (```, so it formats nicely).

Could you try increasing enclave size in your manifest? Seems you are using the default value, which might be not enough if you have a complex application. https://gramine.readthedocs.io/en/latest/manifest-syntax.html#enclave-size

Also I'm not sure if putting a application using systemctl in Gramine makes much sense...

M-V-Manikandan commented 2 years ago

Thanks @dimakuv and @boryspoplawski As suggested increased the enclave size and tried. Now getting below error.

[P1:T1:sqvs] debug: execve: start execution
[P1:T1:sqvs] debug: glibc register library /lib/libpthread.so.0 loaded at 0x1203a000
[P1:T1:sqvs] debug: glibc register library /lib/libc.so.6 loaded at 0x11e47000
[P1:T1:sqvs] warning: Unsupported system call 435
[P1:T1:sqvs] warning: Unsupported system call 435
[P1:T1:sqvs] warning: Unsupported system call 435
[P1:T3:sqvs] warning: Unsupported system call 435
[P1:T1:sqvs] warning: Unsupported system call 435
[P1:T1:sqvs] warning: Unsupported system call 435
panic: runtime error: index out of range [0] with length 0

complete log with 'log_level=all' is in the attached file. Please share ur suggestions. Regards, gramine-sqvs-log-all.txt .

boryspoplawski commented 2 years ago

This looks like a problem with your application, not Gramine.

M-V-Manikandan commented 2 years ago

@boryspoplawski Thanks for the response. As we could able to run the application outside gramine, wanted to understand/know the reason for failing in gramine. Could you provide some info on where/why it is failing to run in gramine.

Thanks for your time.

dimakuv commented 2 years ago

@M-V-Manikandan In your attached log, I still see some failures of memory allocation:

[P1:T6:sqvs] trace: ---- shim_mmap(0, 0x8000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
[P1:T6:sqvs] trace: ---- return from shim_mmap(...) = -12
[P1:T6:sqvs] trace: ---- shim_mmap(0, 0x4000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
[P1:T6:sqvs] trace: ---- return from shim_mmap(...) = -12
[P1:T6:sqvs] trace: ---- shim_mmap(0, 0x8000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
[P1:T6:sqvs] trace: ---- return from shim_mmap(...) = -12
[P1:T6:sqvs] trace: ---- shim_mmap(0, 0x4000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
[P1:T6:sqvs] trace: ---- return from shim_mmap(...) = -12

What is the sgx.enclave_size you have now in your manifest file? Looks like it is still not enough.

Try to increase it to 16G or 32G. Note that Golang applications are famous for reserving a lot of memory. I've seen manifest files for Go apps with sgx.enclave_size = "32G".

M-V-Manikandan commented 2 years ago

@dimakuv Thanks for the response. Was using 2G earlier, as suggested increased the size to 32G. But still observing the below error.

panic: [P1:T1:sqvs] trace: ---- return from shim_write(...) = 0x7
[P1:T1:sqvs] trace: ---- shim_write(2, 0x6c016a040, 0x33) ...
runtime error: index out of range [0] with length 0[P1:T1:sqvs]

complete logs with 'all' is attached. Thanks & Regards, SQVS_Out_Of_Bounds_Size_32G_Logs_All.txt

boryspoplawski commented 2 years ago

@dimakuv these seem like memory allocator probing, they are not real fatal failures. @M-V-Manikandan No idea, it's just an error from your application. My guess would be that it's trying to use systemctl (since you mentioned that), but it's failing - you probably did not put systemd inside the enclave (which I'm fairly certain is completely pointless and also impossible).

dimakuv commented 2 years ago

@M-V-Manikandan There is some more output from the Go application itself, based on your log:

goroutine [running]:
flag.init(): /usr/local/go/src/flag/flag.go:1010+0x176

If I understand correctly, there is something wrong on line 1010 of this Go file: https://golang.org/src/flag/flag.go

So looks like it fails at parsing the command-line arguments. Are you sure you have loader.insecure__use_cmdline_argv = true in your final manifest file?

Also, try to add loader.insecure__use_host_env = true line in your manifest file. Maybe Golang needs some environment variables that you defined on the host system. (Of course remember that these options are just for debugging, replace them with better options for production.)

@boryspoplawski Looking at the log, I don't think it's systemctl. It looks like the application fails in the very beginning of execution, not even getting to the actual systemctl workload. But in general I agree, putting systemctl in the SGX enclave doesn't seem to make sense.

M-V-Manikandan commented 2 years ago

@dimakuv @boryspoplawski Thanks for your responses. We tried by including '/usr/local/go/src/' in manifest along with your suggestions.

We could able to bring-up the http server application in gramine (without systemctl), however we are facing issue when the application try to write (respond to the incoming requests).


runtime.netpollBreak[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x14
[P1:T2:sqvs] trace: ---- return from shim_nanosleep(...) = 0x0
[P1:T14:sqvs] trace: ---- shim_write(2, 0x84580d, 0x1) ...
[P1:T2:sqvs] trace: ---- shim_clock_gettime(1, 0x748bbdd10) = 0x0
([P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x1
[P1:T2:sqvs] trace: ---- shim_nanosleep([0,20000], 0) ...
[P1:T14:sqvs] trace: ---- shim_write(2, 0x845870, 0x2) ...
)
[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x2
[P1:T2:sqvs] trace: ---- return from shim_nanosleep(...) = 0x0
[P1:T14:sqvs] trace: ---- shim_write(2, 0x845832, 0x1) ...
        [P1:T2:sqvs] trace: ---- shim_clock_gettime(1, 0x748bbdd10) = 0x0
[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x1
[P1:T2:sqvs] trace: ---- shim_epoll_pwait(4, 0x748bbd730, 128, 0, 0, 0xbaa808) ...
[P1:T14:sqvs] trace: ---- shim_write(2, 0xb3653d, 0x2a) ...
/usr/local/go/src/runtime/netpoll_epoll.go[P1:T2:sqvs] trace: ---- return from shim_epoll_pwait(...) = 0x1
[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x2a
[P1:T14:sqvs] trace: ---- shim_write(2, 0x845819, 0x1) ...
[P1:T2:sqvs] trace: ---- shim_nanosleep([0,20000], 0) ...
:[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x1
[P1:T14:sqvs] trace: ---- shim_write(2, 0x74861161e, 0x2) ...
[P1:T2:sqvs] trace: ---- return from shim_nanosleep(...) = 0x0
90[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x2
[P1:T2:sqvs] trace: ---- shim_clock_gettime(1, 0x748bbdd10) = 0x0
[P1:T14:sqvs] trace: ---- shim_write(2, 0x845844, 0x2) ...
[P1:T2:sqvs] trace: ---- shim_nanosleep([0,20000], 0) ...
 +[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x2
[P1:T14:sqvs] trace: ---- shim_write(2, 0x74861163c, 0x4) ...
0xc6[P1:T2:sqvs] trace: ---- return from shim_nanosleep(...) = 0x0
[P1:T14:sqvs] trace: ---- return from shim_write(...) = 0x4
[P1:T2:sqvs] trace: ---- shim_clock_gettime(1, 0x748bbdd10) = 0x0
[P1:T14:sqvs] trace: ---- shim_write(2, 0x845830, 0x1) ...

could you comment on the errors? netpoll_Break-error.txt

Logs with level 'all' is attached.

Thanks & Regards,

dimakuv commented 2 years ago

@M-V-Manikandan Disable the Gramine log (set loader.log_level = "error") and look at what Go app writes to you.

Go app actually gives you a lot of information about the error. For example, I noticed (from your attached log) that the issue happens somewhere in some Go library that performs SGX ECDSA quote verification or something like this -- which may indicate that your SGX environment is setup incorrectly.

Go writes its output in a weird way -- it is all intermixed with the Gramine log. To have clearly separated Gramine log and app output, you can use smth like loader.log_file = "gramine.log".

M-V-Manikandan commented 2 years ago

@dimakuv Thanks for the suggestions. We tried as suggested on logs. Got the go application error messages on console. Did not get any messages in 'gramine.log'.

INFO[00001] 2021-11-24T07:32:50.734706Z : Starting SGX Quote Verification Server; name=default
INFO[00001] 2021-11-24T07:32:50.735553Z : service start; name=security
INFO[00001] 2021-11-24T07:32:50.735553Z : service start; name=security
runtime: netpollBreak write failed with 13
fatal error: runtime: netpollBreak write failed
runtime: netpollBreak write failed with 13
fatal error: runtime: netpollBreak write failed

runtime stack:
runtime.throw(0x8547b0, 0x22)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.netpollBreak()
        /usr/local/go/src/runtime/netpoll_epoll.go:90 +0xc6
runtime.findrunnable(0x9003e000, 0x0)
        /usr/local/go/src/runtime/proc.go:2363 +0xa51
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2526 +0x2fc
runtime.park_m(0x90376d80)
        /usr/local/go/src/runtime/proc.go:2696 +0x9d
runtime.mcall(0x40000)
        /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [chan receive]:
main.(*App).startServer(0x903bfeb0, 0x0, 0x0)
        /root/workspace/sgx-verification-service/app.go:484 +0x7dc
main.(*App).Run(0x903bfeb0, 0x9000e080, 0x2, 0x2, 0x901b9ef0, 0x7af560)
        /root/workspace/sgx-verification-service/app.go:246 +0x25c
main.main()
        /root/workspace/sgx-verification-service/main.go:105 +0x12c

goroutine 6 [syscall]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.Notify.func1
        /usr/local/go/src/os/signal/signal.go:127 +0x44

goroutine 8 [IO wait]:
internal/poll.runtime_pollWait(0x95942f18, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0x90294998, 0x72, 0x0, 0x0, 0x846dab)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0x90294980, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1d4
net.(*netFD).accept(0x90294980, 0xba7d2108, 0x0, 0x0)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0x9000f4c0, 0x50, 0x804880, 0xc68abf00a4ce9201)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0x9000f4c0, 0x28702858, 0x287028589035f0e0, 0x619deab2, 0x902adaf8)
        /usr/local/go/src/net/tcpsock.go:261 +0x64
crypto/tls.(*listener).Accept(0x9000ff80, 0x902adb48, 0x18, 0x90376000, 0x67529c)
        /usr/local/go/src/crypto/tls/tls.go:55 +0x37
net/http.(*Server).Serve(0x903520e0, 0x8ec8e0, 0x9000ff80, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2901 +0x25d
net/http.(*Server).ServeTLS(0x903520e0, 0x8ed720, 0x9000f4c0, 0x9033d640, 0x16, 0x9033d5e0, 0x11, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2973 +0x297
net/http.(*Server).ListenAndServeTLS(0x903520e0, 0x9033d640, 0x16, 0x9033d5e0, 0x11, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:3130 +0x11b
main.(*App).startServer.func4(0x903520e0, 0x9002c720)
        /root/workspace/sgx-verification-service/app.go:475 +0x91
created by main.(*App).startServer
        /root/workspace/sgx-verification-service/app.go:470 +0x73b

goroutine 9 [runnable]:
crypto/x509.getPublicKeyAlgorithmFromOID(0x90598000, 0x7, 0x7, 0x0)
        /usr/local/go/src/crypto/x509/x509.go:490 +0x1c1
crypto/x509.parseCertificate(0x904ec900, 0x3bb, 0x3c9, 0x7a6ea0)
        /usr/local/go/src/crypto/x509/x509.go:1358 +0x221
crypto/x509.ParseCertificate(0x90435800, 0x3bb, 0x3c9, 0x904d7380, 0x9050bfa1, 0x24ed8)
        /usr/local/go/src/crypto/x509/x509.go:1579 +0xf8
crypto/x509.(*CertPool).AppendCertsFromPEM(0x90400f00, 0x90500000, 0x30e79, 0x31079, 0x31079)
        /usr/local/go/src/crypto/x509/cert_pool.go:139 +0xf2
crypto/x509.loadSystemRoots(0x1a, 0x902a8f90, 0x0)
        /usr/local/go/src/crypto/x509/root_unix.go:51 +0x5b1
crypto/x509.initSystemRoots()
        /usr/local/go/src/crypto/x509/root.go:21 +0x26
sync.(*Once).doSlow(0xbaa928, 0x86a640)
        /usr/local/go/src/sync/once.go:66 +0xec
sync.(*Once).Do(...)
        /usr/local/go/src/sync/once.go:57
crypto/x509.systemRootsPool(...)
        /usr/local/go/src/crypto/x509/root.go:16
crypto/x509.SystemCertPool(0x84edb6, 0x1a, 0x84621e)
        /usr/local/go/src/crypto/x509/cert_pool.go:61 +0xa5
main.fnGetJwtCerts(0x0, 0x0)
        /root/workspace/sgx-verification-service/app.go:631 +0x242
intel/isecl/lib/common/v4/middleware.NewTokenAuth.func1.1(0x958bd068, 0x90400180, 0x9042c200)
        /root/go/pkg/mod/github.com/intel-secl/common/v4@v4.0.0/middleware/tokenauth.go:95 +0x250
net/http.HandlerFunc.ServeHTTP(0x90452000, 0x958bd068, 0x90400180, 0x9042c200)
        /usr/local/go/src/net/http/server.go:2012 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0x9036c000, 0x958bd068, 0x90400180, 0x9042c000)
        /root/go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 +0xe2
github.com/gorilla/handlers.loggingHandler.ServeHTTP(0x8e6640, 0x90010020, 0x8e5fe0, 0x9036c000, 0x86a918, 0x8ed9a0, 0x9043c000, 0x9042c000)
        /root/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/logging.go:45 +0x243
github.com/gorilla/handlers.recoveryHandler.ServeHTTP(0x8e6bc0, 0x9035eab0, 0x8e6220, 0x901aebe0, 0x1, 0x8ed9a0, 0x9043c000, 0x9042c000)
        /root/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/recovery.go:78 +0xce
net/http.serverHandler.ServeHTTP(0x903520e0, 0x8ed9a0, 0x9043c000, 0x9042c000)
        /usr/local/go/src/net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0x9034b360, 0x8ee220, 0x90418000)
        /usr/local/go/src/net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2933 +0x35c

runtime stack:
runtime.throw(0x8547b0, 0x22)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.netpollBreak()
        /usr/local/go/src/runtime/netpoll_epoll.go:90 +0xc6
runtime.findrunnable(0x90040800, 0x0)
        /usr/local/go/src/runtime/proc.go:2363 +0xa51
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2526 +0x2fc
runtime.park_m(0x90376f00)
        /usr/local/go/src/runtime/proc.go:2696 +0x9d
runtime.mcall(0x40000)
        /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

Suggestions to narrowdown and understand on which part of the interaction with gramine fails would be helpful.

Thanks & Regards.

dimakuv commented 2 years ago

Thanks for the log! Golang prints very good backtraces.

Now we see that Golang has issues in the polling mechanism (epoll, see https://man7.org/linux/man-pages/man7/epoll.7.html):

runtime: netpollBreak write failed with 13
fatal error: runtime: netpollBreak write failed

runtime.netpollBreak()
        /usr/local/go/src/runtime/netpoll_epoll.go:90 +0xc6

Borys is currently working on select/poll/epoll implementations in Gramine codebase. @boryspoplawski Is this similar to what you've debugged? Will it be likely fixed by your Gramine modifications?

Anyway, looks like a genuine bug in Gramine, assigning it high priority and a "Bug" label.

boryspoplawski commented 2 years ago

It doesn't look similar at all, the issues we've seen were either deadlocks or crashes (but these didn't happen in real-life workloads).

runtime: netpollBreak write failed with 13

Doesn't seem related to epoll. I would recommend doing a test run with sgx.file_check_policy = "allow_all_but_log" in manifest

M-V-Manikandan commented 2 years ago

@boryspoplawski Thanks for commenting. As recommended tried with 'allow_all_but_log'.
Console logs are below.

-----------------------------------------------------------------------------------------------------------------------
Gramine detected the following insecure configurations:

  - loader.log_level = warning|debug|trace|all (verbose log level, may leak information)
  - loader.insecure__use_cmdline_argv = true   (forwarding command-line args from untrusted host to the app)
  - loader.insecure__use_host_env = true       (forwarding environment vars from untrusted host to the app)
  - sgx.file_check_policy = allow_all_but_log  (all files are passed through from untrusted host without verification)

Gramine will continue application execution, but this configuration must not be used in production!
-----------------------------------------------------------------------------------------------------------------------

INFO[00001] 2021-11-25T16:37:50.274928Z : log init; name=security
INFO[00001] 2021-11-25T16:37:50.274928Z : log init; name=security
INFO[00001] 2021-11-25T16:37:50.276321Z : log init; name=default
INFO[00001] 2021-11-25T16:37:50.276534Z : Starting SGX Quote Verification Server; name=default
TRAC[00001] 2021-11-25T16:37:50.276843Z : resource/version:getVersion() Entering; name=default
TRAC[00001] 2021-11-25T16:37:50.277053Z : resource/version:getVersion() Leaving; name=default
INFO[00001] 2021-11-25T16:37:50.279486Z : service start; name=security
INFO[00001] 2021-11-25T16:37:50.279486Z : service start; name=security

runtime: netpollBreak write failed with 13
fatal error: runtime: netpollBreak write failed

runtime stack:
runtime.throw(0x8547b0, 0x22)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.netpollBreak()
        /usr/local/go/src/runtime/netpoll_epoll.go:90 +0xc6
runtime.findrunnable(0xe803b800, 0x0)
        /usr/local/go/src/runtime/proc.go:2363 +0xa51
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2526 +0x2fc
runtime.park_m(0xe8001080)
        /usr/local/go/src/runtime/proc.go:2696 +0x9d
runtime.mcall(0x40000)
        /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [chan receive]:
main.(*App).startServer(0xe833feb0, 0x0, 0x0)
        /root/workspace/sgx-verification-service/app.go:484 +0x7dc
main.(*App).Run(0xe833feb0, 0xe800e080, 0x2, 0x2, 0xe81b9ef0, 0x7af560)
        /root/workspace/sgx-verification-service/app.go:246 +0x25c
main.main()
        /root/workspace/sgx-verification-service/main.go:105 +0x12c

goroutine 6 [syscall]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.Notify.func1
        /usr/local/go/src/os/signal/signal.go:127 +0x44

goroutine 18 [IO wait]:
internal/poll.runtime_pollWait(0xed3d0f18, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xe829e098, 0x72, 0x0, 0x0, 0x846dab)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xe829e080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1d4
net.(*netFD).accept(0xe829e080, 0x1141df108, 0x0, 0x0)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xe8296080, 0x50, 0x804880, 0x811b66676f8d201)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xe8296080, 0xbf66c78, 0xbf66c78e82deb10, 0x619fbc2c, 0xe8229af8)
        /usr/local/go/src/net/tcpsock.go:261 +0x64
crypto/tls.(*listener).Accept(0xe8296b40, 0xe8229b48, 0x18, 0xe8282300, 0x67529c)
        /usr/local/go/src/crypto/tls/tls.go:55 +0x37
net/http.(*Server).Serve(0xe829c000, 0x8ec8e0, 0xe8296b40, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2901 +0x25d
net/http.(*Server).ServeTLS(0xe829c000, 0x8ed720, 0xe8296080, 0xe82bd620, 0x16, 0xe82bd5c0, 0x11, 0x0, 0x4459d6)
        /usr/local/go/src/net/http/server.go:2973 +0x297
net/http.(*Server).ListenAndServeTLS(0xe829c000, 0xe82bd620, 0x16, 0xe82bd5c0, 0x11, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:3130 +0x11b
main.(*App).startServer.func4(0xe829c000, 0xe802ca20)
        /root/workspace/sgx-verification-service/app.go:475 +0x91
created by main.(*App).startServer
        /root/workspace/sgx-verification-service/app.go:470 +0x73b

goroutine 8 [select]:
net/http.(*Transport).getConn(0xe88bc280, 0xe88a0600, 0x0, 0xe8409100, 0x5, 0xe889e6e0, 0x13, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/net/http/transport.go:1291 +0x57b
net/http.(*Transport).roundTrip(0xe88bc280, 0xe840e400, 0xe84a4c20, 0x418b7e, 0xed1c6c10)
        /usr/local/go/src/net/http/transport.go:552 +0x726
net/http.(*Transport).RoundTrip(0xe88bc280, 0xe840e400, 0xe88bc280, 0x0, 0x0)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xe840e400, 0x8e6360, 0xe88bc280, 0x0, 0x0, 0x0, 0xe8752010, 0x8f77a0, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xe88a0570, 0xe840e400, 0x0, 0x0, 0x0, 0xe8752010, 0x0, 0x1, 0x413f42)
        /usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xe88a0570, 0xe840e400, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:699 +0x44a
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:567
main.fnGetJwtCerts(0x0, 0x0)
        /root/workspace/sgx-verification-service/app.go:649 +0x37a
intel/isecl/lib/common/v4/middleware.NewTokenAuth.func1.1(0xed2c6130, 0xe84143c0, 0xe840e300)
        /root/go/pkg/mod/github.com/intel-secl/common/v4@v4.0.0/middleware/tokenauth.go:95 +0x250
net/http.HandlerFunc.ServeHTTP(0xe845c050, 0xed2c6130, 0xe84143c0, 0xe840e300)
        /usr/local/go/src/net/http/server.go:2012 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xe82ec000, 0xed2c6130, 0xe84143c0, 0xe840e100)
        /root/go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 +0xe2
github.com/gorilla/handlers.loggingHandler.ServeHTTP(0x8e6640, 0xe8010020, 0x8e5fe0, 0xe82ec000, 0x86a918, 0x8ed9a0, 0xe847a000, 0xe840e100)
        /root/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/logging.go:45 +0x243
github.com/gorilla/handlers.recoveryHandler.ServeHTTP(0x8e6bc0, 0xe829a000, 0x8e6220, 0xe8290000, 0x1, 0x8ed9a0, 0xe847a000, 0xe840e100)
        /root/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/recovery.go:78 +0xce
net/http.serverHandler.ServeHTTP(0xe829c000, 0x8ed9a0, 0xe847a000, 0xe840e100)
        /usr/local/go/src/net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0xe82cd220, 0x8ee220, 0xe8462000)
        /usr/local/go/src/net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2933 +0x35c

goroutine 147 [runnable]:
net/http.(*Transport).dialConnFor(0xe88bc280, 0xe8640160)
        /usr/local/go/src/net/http/transport.go:1362
created by net/http.(*Transport).queueForDial
        /usr/local/go/src/net/http/transport.go:1334 +0x3fe

Thanks & Regards.

boryspoplawski commented 2 years ago

You need to set log level to warning or above, otherwise nothing will be printed about file accesses.

boryspoplawski commented 2 years ago

Or if you already did, then this is not related to an inaccessible file and some other issue, most likely Gramine bug.

M-V-Manikandan commented 2 years ago

@boryspoplawski Thanks for the response. Attaching the gramine.log with level set as 'debug' as suggested. gramine_log_debug_2611.txt Also attaching the console of the same run. gramine-sqvs-consolelogs-2611.txt Thanks & Regards.

dimakuv commented 2 years ago

@M-V-Manikandan I found in your log the following snippet:

[P1:T4:sqvs] warning: shim_socket: unknown socket domain 16
[P1:T4:sqvs] warning: shim_socket: unknown socket domain 16

This unsupported socket domain 16 corresponds to AF_NETLINK (see https://elixir.bootlin.com/linux/latest/source/include/linux/socket.h#L193).

Gramine doesn't support AF_NETLINK currently. We are aware of this issue, but there is currently no implementation for this, unfortunately. This may be the cause of your issues in your Golang application...

boryspoplawski commented 2 years ago

@dimakuv I doubt this is the problem, go would probably write some error message about it. It seems to be failing on a write to a pipe, which is kind of weird.

@M-V-Manikandan Could you try running you app with loader.log_level = "all" in manifest, loader.log_file removed and stdout+stderr redirected to file (like your_cmdline >log 2>&1)? Maybe we could get why the read is failing from that.

M-V-Manikandan commented 2 years ago

@boryspoplawski Thanks for the comments. As suggested taken the logs with log_level as "all", no log_file and redirected to a file from cmdline. all_logs.zip

Thanks & Regards.

boryspoplawski commented 2 years ago

Seems that -13 (EACCES) is returned from write on a nonblocking pipe. I couldn't get more from the logs. @dimakuv This is not an epoll issue.

jkr0103 commented 2 years ago

@M-V-Manikandan Could you help me to replicate the issue in my machine by sharing application example and steps to repro the issue?

M-V-Manikandan commented 2 years ago

Hi @jkr0103, https://github.com/intel-secl/sgx-verification-service is the application which we are trying to run as a gramine application.

Once the application(https server) is up & listening in gramine, a curl/REST API request for quote verification is sent.

The application is crashing while responding to the request. BT already attached in earlier comments.

Thanks & Regards.

dimakuv commented 1 year ago

This issue is inactive for more than a year, and it's unclear what was the root cause and whether it was fixed. I'm closing it.

ChrisYeq commented 2 months ago

Hi,bro, is there a conclusion to this problem? I also encountered the same problem.

mkow commented 2 months ago

@ChrisYeq: Can you provide more details, including reproduction steps?