intel / linux-sgx

Intel SGX for Linux*
https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/linux-overview.html
Other
1.34k stars 548 forks source link

TCS pointer becomes misaligned during OCALL #748

Open gingerbeard-man opened 3 years ago

gingerbeard-man commented 3 years ago

We have an application using sgx_2.11 on RHEL 8.4. There is an OCALL which calls a web service and passes the response back into the enclave.

        int ocall_httpRequest(
                        [in, string] const char *url,
                        [in, string] const char *body,
                        [in, string] const char *host,
                        int port,
                        int ssl,
                        int httpPost,
                        [in, string] const char *headers,
                        [out, size=responseSize] char *response,
                        size_t responseSize,
                        [out, count=1] int *statusCode);

A single line change in one of the functions used in this OCALL causes the enclave to exit. On my dev system in simulation mode I get this message:

[_SE3 u_instructions.cpp:236] #GP on u_instructions.cpp, line: 236

It seems to indicate that the TCS pointer is no longer page-aligned.

https://github.com/intel/linux-sgx/blob/b9b071b54476e93ba21ae4f8dc41394970667cdd/sdk/simulation/uinst/u_instructions.cpp#L235-L236

The OCALL ultimately uses Boost's Beast HTTP library. The one added line is a call of SSL_set_tlsext_host_name() to use SNI, as the web service in a new test environment requires it. Without this line the code runs just fine. As soon as it is added the enclave exits when the OCALL returns.

I am trying to understand what is causing this and whether we can avoid it by modifying our code somehow. Or is it possibly a bug in sgx_2.11?

lzha101 commented 3 years ago

The TCS address should always be page aligned. Suggest you to build a debug version SDK and use sgx-gdb to investigate. And you can also see the tcs info in the execution log as below during app execution with debug version SDK. Suppose you can check the TCS address from the log. You can see the RVA is page aligned in below log.

build_contexts, step = 0x0000000000414000 build_context Entry Id = 4, TCS , Page Count = 1, Attributes = 0x03, Flags = 0x0000000000000100, RVA = 0x00000000001B8000 -> RVA = 0x00000000005CC000

gingerbeard-man commented 3 years ago

Thank you for those suggestions. The execution log of the debug version is only output when the enclave starts, at which point the TCS is still ok.

build_contexts, step = 0x0000000000135000 build_context Entry Id = 4, TCS , Page Count = 1, Attributes = 0x03, Flags = 0x0000000000000100, RVA = 0x0000000020F8D000 -> RVA = 0x00000000210C2000

Stepping through the return from the OCALL in sgx-gdb was an interesting experience. Eventually I end up in enter_enclave.S and a call to function stack_sticker, then

121     call    stack_sticker
(gdb) next
../../gdb/infrun.c:6301: internal-error: void process_event_stop_test(execution_control_state*): Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

I still need to figure out what exactly to look for in the preceeding steps.