gramineproject / examples

Sample applications configs for Gramine
BSD 3-Clause "New" or "Revised" License
28 stars 22 forks source link

Increase enclave size in tensorflow-lite manifest template #71

Closed anjalirai-intel closed 1 year ago

anjalirai-intel commented 1 year ago

In our local CI, we had seen tensorflow-lite failing with std::bad_alloc error intermittently, but we were not able to repro it manually. To debug this further, we enabled the debug logs in jenkins, and found out that sometimes it results in ENOMEM error.

I have attached the log snippet and complete tensorflow-lite_debug.log debug logs for reference

(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from futex(...) = 0x0
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- futex(0xf0cef50, FUTEX_PRIVATE|FUTEX_WAKE, 1, 0, 0xe219ac8, -1) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from futex(...) = 0x0
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x17e1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- brk(0x294c000) = 0x114b000
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x1801000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x8000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x4000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x8000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x4000000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- mmap(0, 0x17e1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from mmap(...) = -12
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- futex(0xdc17040, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0, 0xca, 1) ...
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from futex(...) = 0x0
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xdf464d8, 0x30) ...
terminate called after throwing an instance of '(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0x30
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xe202cc0, 0xe) ...
std::bad_alloc(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0xe
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xdf464c4, 0x2) ...
'
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0x2
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xdf464c7, 0xb) ...
  what():  (libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0xb
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xdf45ea2, 0xe) ...
std::bad_alloc(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0xe
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- write(2, 0xddf7703, 0x1) ...

(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- return from write(...) = 0x1
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- rt_sigprocmask(UNBLOCK, [SIGABRT,], NULL, 0x8) = 0x0
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- gettid() = 0x1
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- getpid() = 0x1
(libos_parser.c:1609:buf_write_all) [P1:T1:label_image] trace: ---- tgkill(1, 1, [SIGABRT]) = 0x0
(libos_signal.c:59:sighandler_kill) [P1:T1:label_image] debug: killed by signal 6
(libos_parser.c:1609:buf_write_all) [P1:T4:label_image] trace: ---- return from futex(...) = -512
(libos_parser.c:1609:buf_write_all) [P1:T3:label_image] trace: ---- return from futex(...) = -512
(libos_parser.c:1609:buf_write_all) [P1:T5:label_image] trace: ---- return from futex(...) = -512
(libos_parser.c:1609:buf_write_all) [P1:T2:label_image] trace: ---- return from futex(...) = -512
(libos_fs_lock.c:653:posix_lock_clear_pid) [P1:T1:label_image] debug: clearing POSIX locks for pid 1
(libos_init.c:568:create_pipe) [P1:T5:label_image] debug: Creating pipe: pipe.srv:3d9bdbe16ce7711d96bdca5e8cdec69beb2c479b437053b107ea66c1dcf82798
(libos_sync_client.c:331:shutdown_sync_client) [P1:T1:label_image] debug: sync client shutdown: closing handles
(libos_sync_client.c:346:shutdown_sync_client) [P1:T1:label_image] debug: sync client shutdown: waiting for confirmation
(libos_sync_client.c:359:shutdown_sync_client) [P1:T1:label_image] debug: sync client shutdown: finished
(libos_async.c:122:install_async_event) [P1:T5:label_image] debug: Installed async event at 1679598352431665
(libos_async.c:122:install_async_event) [P1:T3:label_image] debug: Installed async event at 1679598352431665
(libos_async.c:158:libos_async_worker) [P1:libos] debug: Async worker thread started
(libos_async.c:122:install_async_event) [P1:T4:label_image] debug: Installed async event at 1679598352431665
(libos_async.c:327:libos_async_worker) [P1:libos] debug: Thread exited, cleaning up
(libos_async.c:327:libos_async_worker) [P1:libos] debug: Thread exited, cleaning up
(libos_async.c:327:libos_async_worker) [P1:libos] debug: Thread exited, cleaning up
(libos_async.c:122:install_async_event) [P1:T2:label_image] debug: Installed async event at 1679598352431668
(libos_ipc_worker.c:285:ipc_worker_main) [P1:libos] debug: IPC worker: exiting worker thread
(libos_exit.c:58:libos_clean_and_exit) [P1:T1:label_image] debug: process 1 exited with status 134
(pal_process.c:248:_PalProcessExit) debug: PalProcessExit: Returning exit code 134
make: *** [Makefile:87: run-gramine] Error 134

In local ci we had increased the enclave size from 512 mb to 2gb and we don't see intermittent failures anymore

Hence raising this PR to increase the enclave size to 2GB


This change is Reviewable