Closed llly closed 3 years ago
This issue is probably the root cause of #1924.
@llly Thanks for debugging this!
Yes, we are painfully aware of the inadequacy of mmap(.., PROT_NONE)
under current Intel SGX (and therefore Graphene). Unfortunately, we currently don't see any reasonable fix to this issue. Would you have an idea how to fix this?
The problem is: Intel SGX version 1 doesn't allow dynamic enclave memory management. The virtual-space limit must be specified per enclave (via sgx.enclave_size
). All this virtual enclave memory is allocated at enclave startup. There is no notion of "allocating enclave pages on demand".
This is fixed in Intel SGX version 2, with a feature called Enclave Dynamic Memory Management (EDMM). Unfortunately, this feature is currently not supported by the upstream Intel SGX driver. Thus, it is also not supported in Graphene; it will be supported only somewhere in 2021.
TLDR: The correct way to fix this Java issue is to wait for EDMM support in SGX driver and Graphene. I don't know of any other correct way.
@dimakuv You are right. EDMM is the final fix. We need to find a workaround anyhow.
According to GLIBC manual Memory Protection about mmap syscall
We are implementing Linux syscall API, not libc (which are different and often have different semantics, despite using the same function names). But in this case Linux also does lazy mappings.
I don't think we can do anything about this, it's a hardware limitation, we just need to wait for SGX2 support.
This is actually one of limitations of SGX1, there is a SGX2 patch that contributed by @rainfld about 2 years ago PR #234, you can try it before the SGX2 can be fully supported. but the following workarounds can also be considered for special cases. 1) Do mmap on host memory space instead of EPC conditionally (Note that, No SGX security benefits at all) 2) Patch SGX driver to handle memory preserve according to particular cases. 3) Some code logics actually smart enough to reduce memory consumption when failed to preserve memory space, so fail fast from GSGX LibOS (fit into some cases e.g. pre-allocating scenario) In Java cases, Please also try pre-allocation/prefetching options.
@llly Actually, I'm curious if you tried huge enclaves? Like sgx.enclave_size = 1024G
? I know it may take minutes (hours? days?). But would be quite interesting to know.
@bigdata-memory We are trying No.3 to find a Java GC that can relocate objects and reduce memory fragmentation.
@dimakuv I tried sgx.enclave_size = "64G"
with JAVA -Xmx50g
on 128G EPC machine and it take 40s to start enclave and 70s to finish first mmap 32GB. For sgx.enclave_size = "128G"
it take 100s to start enclave.
A Java function Processbuilder.start()
is used a lot to run native command such as setsid, rm, chmod. It call fork() syscall firstly, and execve() then. fork() of Java in Graphene cost a lot if enclave_size is big. It takes 10min to finish one Processbuilder.start("chmod")
with Java manifest sgx.enclave_size = "128G"
. That's the problem for large enclave_size.
@llly Thanks for the information!
So the problem is not only in the large enclave sizes, but also in the fork/execve pattern. We know about this problem as well, and we had some ideas for optimization of forks (for example, a pool of pre-initialized enclaves waiting for a parent to fork). But this was low priority for us...
Closing, as I believe this isn't something which we can fix (it's a hardware limitation, not an issue with Graphene).
Description of the problem
According to GLIBC manual Memory Protection about mmap syscall
For example,
mmap(0x0,134217728,PROT_NONE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,-1,0)
only allocate virtual memory but not physical memory. It will always success except virtual memory runs out. On native 64bit OS, Program Virtual Memory space is UINT64_MAX. Most program including java don’t consider virtual memory as a scarce resource.However Graphene uses EPC as Program Virtual Memory space and its size is
sgx.enclave_size
in manifest. Graphene also allocate physical EPC on mmap() with PROT_NONE and MAP_ANONYMOUS flag. mmap() with PROT_NONE and MAP_ANONYMOUS flag fails when total size is bigger thansgx.enclave_size
.Although
sgx.enclave_size
can be bigger than physical EPC size because EPC page can swap in and swap out.sgx.enclave_size
can never be big enough as 128TB. The performance drops a lot if we increasesgx.enclave_size
only for more virtual memory.Steps to reproduce
C program:
with manifest item
sgx.enclave_size = "4G"
on a 256MB EPC machine.Expected results
mmap failed. errno = 12, time = 8388606
On native 64bit Ubuntu 18.04. about 125TB.Actual results
mmap failed. errno = 12, time = 247
On graphene, 3952MB.Additional information
This issue Blocks java from running workload for long time.
Java -Xmx
option seems only limit the physical memory usage but not virtual memory space usage. Here is a debug log snap and my comments for java reporting Out Of Memory when -Xmx is small thansgx.enclave_size
.