If an async signal happens while handling an ocall, -EINTR is injected and code returns back to the enclave as the actual host-level syscall was interrupted.
https://github.com/oscarlab/graphene/blob/4cb98219d8e302055587a8952c1102415a72ba42/Pal/src/host/Linux-SGX/sgx_exception.c#L184
If this happens after the syscall was actually done and before returning back to the enclave, -EINTR is injected even tho the syscall might have been successful.
Example problematic case (that we did already trigger on Jenlins) is poll_closed_fd from LibOS regression tests; for details see below.
Fixing this problem would probably require redesigning whole PAL SGX signal handling.
Steps to reproduce
Run poll_closed_fd LibOS regression test a couple of times on current master (4cb98219d8e302055587a8952c1102415a72ba42 at the time of writing). Tested on Ubuntu18.04 in non-debug version (ofc with SGX).
Expected results
Works every time.
Actual results
Fails with read error: Permission denied from time to time. This happens when SIGCHILD interrupts read ocall after the actual syscall instruction in untrusted part. -EINTR gets injected even though some bytes were already read, the code issuing ocall_read has no idea about these bytes so they are lost, TLS session gets desynchronized (because this read happens on an encrypted pipe) and errors start to appear.
Description of the problem
If an async signal happens while handling an ocall,
-EINTR
is injected and code returns back to the enclave as the actual host-level syscall was interrupted. https://github.com/oscarlab/graphene/blob/4cb98219d8e302055587a8952c1102415a72ba42/Pal/src/host/Linux-SGX/sgx_exception.c#L184 If this happens after the syscall was actually done and before returning back to the enclave,-EINTR
is injected even tho the syscall might have been successful. Example problematic case (that we did already trigger on Jenlins) ispoll_closed_fd
from LibOS regression tests; for details see below. Fixing this problem would probably require redesigning whole PAL SGX signal handling.Steps to reproduce
Run
poll_closed_fd
LibOS regression test a couple of times on current master (4cb98219d8e302055587a8952c1102415a72ba42 at the time of writing). Tested on Ubuntu18.04 in non-debug version (ofc with SGX).Expected results
Works every time.
Actual results
Fails with
read error: Permission denied
from time to time. This happens whenSIGCHILD
interrupts read ocall after the actual syscall instruction in untrusted part.-EINTR
gets injected even though some bytes were already read, the code issuingocall_read
has no idea about these bytes so they are lost, TLS session gets desynchronized (because this read happens on an encrypted pipe) and errors start to appear.