intel / linux-sgx

Intel SGX for Linux*
https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/linux-overview.html
Other
1.32k stars 543 forks source link

SampleDNNL throws error "Illegal instruction" #809

Open Praneshss opened 2 years ago

Praneshss commented 2 years ago

I am running Ubuntu 20.04. I was able to build and run SampleDNNL code using SIM and HW mode. The code runs without errors in SIM mode, but gives an error in HW mode. I'm able to run other SampleCodes with HW mode perfectly.

~/linux-sgx/SampleCode/SampleDNNL$ ./app 
Intel(R) Deep Neural Network Library (DNNL)
Illegal instruction (core dumped)
~/linux-sgx/SampleCode/SampleDNNL$ 

How to debug this? I didnt get much help using sgx-gdbas well except that error occurs after ecall.

~/linux-sgx/SampleCode/SampleDNNL$ sgx-gdb ./app 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Source directories searched: ~/linux-sgx/linux/installer/bin/sgxsdk/lib64/gdb-sgx-plugin:$cdir:$cwd
Setting environment variable "LD_PRELOAD" to null value.
Reading symbols from ./app...
Loading libc++ pretty-printers.
(gdb) r
Starting program: ~/linux-sgx/SampleCode/SampleDNNL/app 
detect urts is loaded, initializing
Function "notify_gdb_to_update" not defined.
Function "sgx_debug_load_state_add_element" not defined.
Function "sgx_debug_unload_state_remove_element" not defined.
Function "urts_add_tcs" not defined.
Python Exception <class 'gdb.error'> No symbol "g_debug_enclave_info_list" in current context.: 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Intel(R) Deep Neural Network Library (DNNL)
[New Thread 0x7ffff7a03700 (LWP 220165)]
[New Thread 0x7ffff7202700 (LWP 220166)]
[New Thread 0x7ffff6a01700 (LWP 220167)]
[New Thread 0x7ffff6200700 (LWP 220168)]
[New Thread 0x7ffff59ff700 (LWP 220169)]

Thread 1 "app" received signal SIGILL, Illegal instruction.
0x00007ffe0102543e in ?? ()
(gdb) bt
#0  0x00007ffe0102543e in ?? ()
#1  0x00007ffe00fe214f in ?? ()
#2  0x00007ffe9148cc80 in ?? ()
#3  0x00007ffe00fe12e1 in ?? ()
#4  0x00007ffe020adaa0 in ?? ()
#5  0x00007ffe9148ca10 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) disassemble 
No function contains program counter for selected frame.
(gdb) 
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main (argc=21845, argv=0x7ffff7d992e8 <__exit_funcs_lock>) at App/App.cpp:168
168 {
(gdb) n
171     sgx_status_t ret = SGX_SUCCESS;
(gdb) 
172     int retval = 0;
(gdb) 
173     sgx_enclave_id_t eid = 0;
(gdb) 
177     ret = sgx_create_enclave(ENCLAVE_NAME, SGX_DEBUG_FLAG, NULL, NULL, &eid, NULL);
(gdb) 
178     if(ret != SGX_SUCCESS)
(gdb) 
184     cout<<"Intel(R) Deep Neural Network Library (DNNL)" <<endl;
(gdb) 
Intel(R) Deep Neural Network Library (DNNL)
186     ret = getting_started(eid, &retval);
(gdb) s
getting_started (eid=93824992239136, retval=0x555555555e20 <__libc_csu_init>) at App/Enclave_u.c:263
263 {
(gdb) 
266     status = sgx_ecall(eid, 9, &ocall_table_Enclave, &ms);
(gdb) 
[New Thread 0x7ffff7a03700 (LWP 220829)]
[New Thread 0x7ffff7202700 (LWP 220830)]
[New Thread 0x7ffff6a01700 (LWP 220831)]
[New Thread 0x7ffff6200700 (LWP 220832)]
[New Thread 0x7ffff59ff700 (LWP 220833)]

Thread 1 "app" received signal SIGILL, Illegal instruction.
0x00007ffe0102543e in ?? ()
(gdb) 

SGX_SDK version: 2.15.101.1

lzha101 commented 2 years ago

It seems it runs out of TCS. Could you try to update the Enclave/Enclave.config.xml file to enlarge the TCSNum and run the sample again?

lzha101 commented 2 years ago

@Praneshss the issue should have been fixed with the PR #817. You can retry the sample again. Thanks.

yu-zou commented 1 year ago

I think this problem is not fully solved. I can reproduce the same error.

lzha101 commented 1 year ago

@yu-zou do you reproduce it with the SampleDNNL project? Could you please help to paste your environment info here? I cannot reproduce it on my local machine.