mofanv / darknetz

runs several layers of a deep learning model in TrustZone
MIT License
85 stars 29 forks source link

Wrong prediction results #37

Closed intx4 closed 1 year ago

intx4 commented 1 year ago

Hello, By running the code as per README.md, I obtain different prediction results, both using the pre-trained model and a freshly trained one.

Setting:

Hardware

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              1
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           151
Model name:                      12th Gen Intel(R) Core(TM) i9-12900K
Stepping:                        2
CPU MHz:                         3200.000
CPU max MHz:                     5200.0000
CPU min MHz:                     800.0000
BogoMIPS:                        6374.40
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       256 KiB
L2 cache:                        10 MiB
NUMA node0 CPU(s):               0-23

Software

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal
Kernel:         5.15.0-76-generic
QEMU=8.0.2
OP-TEE=3.22.0

Normal World Output:

darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset cfg/mnist
_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.png
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax_TA                                       10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.008191 seconds.
-0.00%: 0
 0.00%: 1
 0.00%: 2
330.02%: 3
 0.00%: 4
user CPU start: 0.394826; end: 0.394910
kernel CPU start: 2.117686; end: 2.118134
Max: 2432  kilobytes
vmsize:281470681747200; vmrss:281470681745792; vmdata:281470681744252; vmstk:187647121162372; vmexe:281470681743768; vmlib:281470681745604
# darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset cfg/mnist
_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.png
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax_TA                                       10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.008095 seconds.
-0.00%: 0
 0.00%: 1
-0.00%: 2
 0.00%: 3
 0.00%: 4
user CPU start: 0.313335; end: 0.313401
kernel CPU start: 2.177080; end: 2.177540
Max: 2560  kilobytes
vmsize:281470681747200; vmrss:281470681745920; vmdata:281470681744252; vmstk:187647121162372; vmexe:281470681743768; vmlib:281470681745604

Secure World Output:

D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_load_ldelf:110 ldelf load address 0x40007000
D/LD:  ldelf:142 Loading TS 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (early TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (Secure Storage TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (REE)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0
D/LD:  ldelf:176 ELF (7fc5c039-0542-4ee1-80af-b4eab2f1998d) at 0x40071000
D/TA:  TA_CreateEntryPoint:72 has been called
D/TA:  TA_OpenSessionEntryPoint:91 has been called
I/TA: secure world opened!
I/TA: aes_cbc_TA decrypt ing
I/TA: aes_cbc_TA decrypt ing
I/TA: aes_cbc_TA decrypt ing
I/TA: aes_cbc_TA decrypt ing
I/TA: aes_cbc_TA decrypt ing
I/TA: aes_cbc_TA decrypt ing
D/TC:? 0 tee_ta_close_session:529 csess 0xaeda7860 id 2
D/TC:? 0 tee_ta_close_session:548 Destroy session
I/TA: Goodbye!
D/TA:  TA_DestroyEntryPoint:79 has been called
D/TC:? 0 destroy_context:326 Destroy TA ctx (0xaeda7800)
intx4 commented 1 year ago

Followup, it seems that the problem appears as soon as I include at least one layer (not necessarily the softmax) in the TEE. If I include the cost layer (which I assume is not used in inference) the problem does not pop up:

# time darknetp classifier predict -pp_start 4 -pp_end 4 cfg/mnist.dataset cfg/m
nist_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.pn
g
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout       p = 0.80                120  ->   120
    6 connected                             120  ->    84
    7 dropout       p = 0.80                 84  ->    84
    8 connected                              84  ->    10
    9 softmax                                          10
   10 cost                                             10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe4.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.007926 seconds.
 -nan%: 0
 -nan%: 1
 -nan%: 2
 -nan%: 3
 -nan%: 4
user CPU start: 0.329176; end: 0.329176
kernel CPU start: 1.622402; end: 1.623005
Max: 2560  kilobytes
vmsize:3840; vmrss:2560; vmdata:892; vmstk:132; vmexe:408; vmlib:2244
real    0m 1.96s
user    0m 0.32s
sys 0m 1.62s
# time darknetp classifier predict -pp_start 4 -pp_end 8 cfg/mnist.dataset cfg/m
nist_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.pn
g
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax                                          10
   10 cost                                             10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe8.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.007697 seconds.
 -nan%: 0
  nan%: 1
  nan%: 2
  nan%: 3
  nan%: 4
user CPU start: 0.308717; end: 0.308792
kernel CPU start: 2.218778; end: 2.219315
Max: 2304  kilobytes
vmsize:3840; vmrss:2304; vmdata:892; vmstk:132; vmexe:408; vmlib:2244
real    0m 2.54s
user    0m 0.30s
sys 0m 2.22s
# time darknetp classifier predict -pp_start 0 -pp_end 0 cfg/mnist.dataset cfg/m
nist_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.pn
g
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv_TA    6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected                             294  ->   120
    5 dropout       p = 0.80                120  ->   120
    6 connected                             120  ->    84
    7 dropout       p = 0.80                 84  ->    84
    8 connected                              84  ->    10
    9 softmax                                          10
   10 cost                                             10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps0_ppe0.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.010004 seconds.
  nan%: 0
  nan%: 1
  nan%: 2
  nan%: 3
  nan%: 4
user CPU start: 0.035260; end: 0.035409
kernel CPU start: 0.095925; end: 0.096330
Max: 2432  kilobytes
vmsize:3844; vmrss:2432; vmdata:896; vmstk:132; vmexe:408; vmlib:2244
real    0m 0.14s
user    0m 0.03s
sys 0m 0.09s
# time darknetp classifier predict -pp_start 10 -pp_end 10 cfg/mnist.dataset cfg
/mnist_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.
png
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected                             294  ->   120
    5 dropout       p = 0.80                120  ->   120
    6 connected                             120  ->    84
    7 dropout       p = 0.80                 84  ->    84
    8 connected                              84  ->    10
    9 softmax                                          10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps10_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.006600 seconds.
78.48%: 3
21.52%: 5
 0.00%: 2
 0.00%: 7
 0.00%: 9
user CPU start: 0.022382; end: 0.022511
kernel CPU start: 0.068234; end: 0.068625
Max: 2560  kilobytes
vmsize:3836; vmrss:2560; vmdata:888; vmstk:132; vmexe:408; vmlib:2244
real    0m 0.10s
user    0m 0.02s
sys 0m 0.07s
mofanv commented 1 year ago

Hi, were you using the same configuration of partition points (layers) for both training and prediction? I think there was a mistake in the README file about en/decryption in TEE being disabled. Apologize the repo is not maintained for a while. Try using the same configuration for partitioning, OR comment out the lines of aes_cbc_TA() in save_weights_TA() function, and aes_cbc_TA in load_weights_TA function.

intx4 commented 1 year ago

Hi, were you using the same configuration of partition points (layers) for both training and prediction? I think there was a mistake in the README file about en/decryption in TEE being disabled. Apologize the repo is not maintained for a while. Try using the same configuration for partitioning, OR comment out the lines of aes_cbc_TA() in save_weights_TA() function, and aes_cbc_TA in load_weights_TA function.

Hi @mofanv, thanks for the reply, I get that the repo is not maintained anymore and this is really appreciated. Yes I have used the same partition in training and prediction (i.e., the same commands shown in the readme actually). I will try to comment out the lines you suggested and see if that changes anything. I'll get back to the issue with updates.

intx4 commented 1 year ago

Hi, were you using the same configuration of partition points (layers) for both training and prediction? I think there was a mistake in the README file about en/decryption in TEE being disabled. Apologize the repo is not maintained for a while. Try using the same configuration for partitioning, OR comment out the lines of aes_cbc_TA() in save_weights_TA() function, and aes_cbc_TA in load_weights_TA function.

Hi @mofanv, thanks for the reply, I get that the repo is not maintained anymore and this is really appreciated. Yes I have used the same partition in training and prediction (i.e., the same commands shown in the readme actually). I will try to comment out the lines you suggested and see if that changes anything. I'll get back to the issue with updates.

Hello, After using the same partitions for both training and inference and commenting out the lines invoking aes_cbc_TA in the function definition of (save|load)_weights_TA inside parser_ta.c the issue is not solved. I will investigate a bit with gdb perhaps. Logs from Normal and Secure Worlds consoles, after training a model and invoking prediction two times with the pre-trained model and once with the freshly trained one:

# NS
# darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset cfg/mnist
_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.png
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax_TA                                       10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.008529 seconds.
-0.00%: 3
 0.00%: 1
 -inf%: 2
-0.00%: 0
 0.00%: 4
user CPU start: 0.559062; end: 0.559241
kernel CPU start: 1.968440; end: 1.969070
Max: 2304  kilobytes
vmsize:281470681747200; vmrss:281470681745664; vmdata:281470681744252; vmstk:187647121162372; vmexe:281470681743768; vmlib:281470681745604
# darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset cfg/mnist
_lenet.cfg models/mnist/mnist_lenet.weights  data/mnist/images/t_00007_c3.png
Prepare session with the TA
Begin darknet
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax_TA                                       10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.008185 seconds.
-0.00%: 3
 0.00%: 1
 0.00%: 2
-0.00%: 0
 0.00%: 4
user CPU start: 0.406411; end: 0.406500
kernel CPU start: 2.123940; end: 2.124405
Max: 2432  kilobytes
vmsize:281470681747200; vmrss:281470681745792; vmdata:281470681744252; vmstk:187647121162372; vmexe:281470681743768; vmlib:281470681745604
# darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset tmp/backu
p/mnist_lenet.weights  data/mnist/images/t_00007_c3.png
Prepare session with the TA
Begin darknet
Segmentation fault
# S
D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_load_ldelf:110 ldelf load address 0x80007000
D/LD:  ldelf:142 Loading TS 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (early TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (Secure Storage TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (REE)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0
D/LD:  ldelf:176 ELF (7fc5c039-0542-4ee1-80af-b4eab2f1998d) at 0x8007b000
D/TA:  TA_CreateEntryPoint:72 has been called
D/TA:  TA_OpenSessionEntryPoint:91 has been called
I/TA: secure world opened!
D/TC:? 0 tee_ta_close_session:529 csess 0x6dfa5860 id 2
D/TC:? 0 tee_ta_close_session:548 Destroy session
I/TA: Goodbye!
D/TA:  TA_DestroyEntryPoint:79 has been called
D/TC:? 0 destroy_context:326 Destroy TA ctx (0x6dfa5800)
D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_load_ldelf:110 ldelf load address 0x80007000
D/LD:  ldelf:142 Loading TS 7fc5c039-0542-4ee1-80af-b4eab2f1998d
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (early TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (Secure Storage TA)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0xffff0008
D/TC:? 0 ldelf_syscall_open_bin:142 Lookup user TA ELF 7fc5c039-0542-4ee1-80af-b4eab2f1998d (REE)
D/TC:? 0 ldelf_syscall_open_bin:146 res=0
D/LD:  ldelf:176 ELF (7fc5c039-0542-4ee1-80af-b4eab2f1998d) at 0x80051000
D/TA:  TA_CreateEntryPoint:72 has been called
D/TA:  TA_OpenSessionEntryPoint:91 has been called
I/TA: secure world opened!
D/TC:? 0 tee_ta_close_session:529 csess 0x6dfa5860 id 2
D/TC:? 0 tee_ta_close_session:548 Destroy session
I/TA: Goodbye!
D/TA:  TA_DestroyEntryPoint:79 has been called
D/TC:? 0 destroy_context:326 Destroy TA ctx (0x6dfa5800)

parser_ta.c file:

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "darknetp_ta.h"
#include "network_TA.h"
#include <parser_TA.h>
#include <blas_TA.h>
#include "math_TA.h"
#include "aes_TA.h"

void aes_cbc_TA(char* xcrypt, float* gradient, int org_len)
{
    IMSG("aes_cbc_TA %s ing\n", xcrypt);
    //convert float array to uint_8 one by one
    uint8_t *byte;
    uint8_t array[org_len*4];
    for(int z = 0; z < org_len; z++){
        byte = (uint8_t*)(&gradient[z]);
        for(int y = 0; y < 4; y++){
            array[z*4 + y] = byte[y];
        }
    }

    //set ctx, iv, and key for aes
    int enc_len = (int)(org_len/4);
    struct AES_ctx ctx;
    uint8_t iv[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };
    uint8_t key[16] = { (uint8_t)0x2b, (uint8_t)0x7e, (uint8_t)0x15, (uint8_t)0x16, (uint8_t)0x28, (uint8_t)0xae, (uint8_t)0xd2, (uint8_t)0xa6, (uint8_t)0xab, (uint8_t)0xf7, (uint8_t)0x15, (uint8_t)0x88, (uint8_t)0x09, (uint8_t)0xcf, (uint8_t)0x4f, (uint8_t)0x3c };

    //encryption
    AES_init_ctx_iv(&ctx, key, iv);
    for (int i = 0; i < enc_len; ++i)
    {
        if(strncmp(xcrypt, "encrypt", 2) == 0){
            AES_CBC_encrypt_buffer(&ctx, array + (i * 16), 16);
        }else if(strncmp(xcrypt, "decrypt", 2) == 0){
            AES_CBC_decrypt_buffer(&ctx, array + (i * 16), 16);
        }
    }

    //convert uint8_t to float one by one
    for(int z = 0; z < org_len; z++){
        gradient[z] = *(float*)(&array[z*4]);
    }
}

void transpose_matrix_TA(float *a, int rows, int cols)
{
    float *transpose = calloc(rows*cols, sizeof(float));
    int x, y;
    for(x = 0; x < rows; ++x){
        for(y = 0; y < cols; ++y){
            transpose[y*rows + x] = a[x*cols + y];
        }
    }
    memcpy(a, transpose, rows*cols*sizeof(float));
    free(transpose);
}

void load_weights_TA(float *vec, int length, int layer_i, char type, int transpose)
{
    // decrypt
    float *tempvec = malloc(length*sizeof(float));
    copy_cpu_TA(length, vec, 1, tempvec, 1);
    //aes_cbc_TA("decrypt", tempvec, length);

    // copy
    layer_TA l = netta.layers[layer_i];

    if(type == 'b'){
        copy_cpu_TA(length, tempvec, 1, l.biases, 1);
    }
    else if(type == 'w'){
        copy_cpu_TA(length, tempvec, 1, l.weights, 1);
    }
    else if(type == 's'){
        copy_cpu_TA(length, tempvec, 1, l.scales, 1);
    }
    else if(type == 'm'){
        copy_cpu_TA(length, tempvec, 1, l.rolling_mean, 1);
    }
    else if(type == 'v'){
        copy_cpu_TA(length, tempvec, 1, l.rolling_variance, 1);
    }

    if(l.type == CONVOLUTIONAL_TA || l.type == DECONVOLUTIONAL_TA){
        if(l.flipped && type == 'w'){
            transpose_matrix_TA(l.weights, l.c*l.size*l.size, l.n);
        }
    }
    else if(l.type == CONNECTED_TA){
        if(transpose && type == 'w'){
            transpose_matrix_TA(l.weights, l.inputs, l.outputs);
        }
    }

    free(tempvec);
}

void save_weights_TA(float *weights_encrypted, int length, int layer_i, char type)
{
    layer_TA l = netta.layers[layer_i];

    if(type == 'b'){
        copy_cpu_TA(length, l.biases, 1, weights_encrypted, 1);
    }
    else if(type == 'w'){
        copy_cpu_TA(length, l.weights, 1, weights_encrypted, 1);
    }
    else if(type == 's'){
        copy_cpu_TA(length, l.scales, 1, weights_encrypted, 1);
    }
    else if(type == 'm'){
        copy_cpu_TA(length, l.rolling_mean, 1, weights_encrypted, 1);
    }
    else if(type == 'v'){
        copy_cpu_TA(length, l.rolling_variance, 1, weights_encrypted, 1);
    }

    // remove the on-device encryption for FL
    //aes_cbc_TA("encrypt", weights_encrypted, length);
}
mofanv commented 1 year ago

Hi, did you finally solve this? I don't have any access to the testbed anymore so cannot test. But I did see similar issues before due to data type size_t for cross-platform. Maybe try this commit https://github.com/mofanv/darknetz/commit/9772418262afc804178eaa050bdd84043173ed91 I think this commit only support one split, and the later part inside TEE, and does not support putting a middle block of layers inside TEEs. But mb worth trying for debugging anyway

intx4 commented 1 year ago

Hi @mofanv, I haven't solved the issue yet, I was a bit busy with other stuff. I'll give that commit a go.

Hi, did you finally solve this? I don't have any access to the testbed anymore so cannot test. But I did see similar issues before due to data type size_t for cross-platform. Maybe try this commit 9772418 I think this commit only support one split, and the later part inside TEE, and does not support putting a middle block of layers inside TEEs. But mb worth trying for debugging anyway

intx4 commented 1 year ago

Hi @mofanv, I have solved the issue. The problem was a UAF (Use-after-free) in classifier.c predict_classifier. The bug manifested whenever the output of the network was computed and retrieved inside the TEE. The line causing the problem is the following:

top_k(predictions, net->outputs, top, indexes);
free(net_output_back);

whenever the end of the NN is in the TEE, predictions and net_output_back point to the same memory, hence in this case we are freeing net_output_back before printing the results, which get corrupted by free. Easy fix is just to push the free at the end:

void predict_classifier(char *datacfg, char *cfgfile, char *weightfile, char *filename, int top)
{
        network *net = load_network(cfgfile, weightfile, 0);
        set_batch_network(net, 1);

        srand(2222222);

        list *options = read_data_cfg(datacfg);

        char *name_list = option_find_str(options, "names", 0);
        if(!name_list) name_list = option_find_str(options, "labels", "data/labels.list");
        if(top == 0) top = option_find_int(options, "top", 1);

        int i = 0;
        char **names = get_labels(name_list);
        clock_t time;
        int *indexes = calloc(top, sizeof(int));
        char buff[256];
        char *input = buff;
        while(1) {
                if(filename) {
                        strncpy(input, filename, 256);
                }else{
                        printf("Enter Image Path: ");
                        fflush(stdout);
                        input = fgets(input, 256, stdin);
                        if(!input) return;
                        strtok(input, "\n");
                }
                image im = load_image_color(input, 0, 0);
                image r = letterbox_image(im, net->w, net->h);
                //image r = resize_min(im, 320);
                //printf("%d %d\n", r.w, r.h);
                //resize_network(net, r.w, r.h);
                //printf("%d %d\n", r.w, r.h);

                float *X = r.data;

                time=clock();
                float *predictions = network_predict(net, X);
                if(net->hierarchy) hierarchy_predictions(predictions, net->outputs, net->hierarchy, 1, 1);

                top_k(predictions, net->outputs, top, indexes);

                struct rusage usage;
                struct timeval startu, endu, starts, ends;

                getrusage(RUSAGE_SELF, &usage);
                startu = usage.ru_utime;
                starts = usage.ru_stime;

                // output file
                struct stat st = {0};
                if (stat("/media/results", &st) == -1) {
                        mkdir("/media/results", 0700);
                }

                char delim[] = "/.";
                char *ptr = strtok(cfgfile, delim);
                ptr = strtok(NULL, delim);

                char pp_str_start[5];
                sprintf(pp_str_start, "%d", partition_point1 + 1);
                char pp_str_end[5];
                sprintf(pp_str_end, "%d", partition_point2);

                char *output_dir[80];
                strcpy(output_dir, "/media/results/predict_");
                strcat(output_dir, ptr);
                strcat(output_dir, "_pps");
                strcat(output_dir, pp_str_start);
                strcat(output_dir, "_ppe");
                strcat(output_dir, pp_str_end);
                strcat(output_dir, ".txt");

                printf("output file: %s\n", output_dir);
                FILE *output_file = fopen(output_dir, "a");

                fprintf(stderr, "%s: Predicted in %f seconds.\n", input, sec(clock()-time));
                fprintf(output_file, "%s: Predicted in %f seconds.\n", input, sec(clock()-time));

                for(i = 0; i < top; ++i) {
                        int index = indexes[i];
                        //if(net->hierarchy) printf("%d, %s: %f, parent: %s \n",index, names[index], predictions[index], (net->hierarchy->parent[index] >= 0) ? names[net->hierarchy->parent[index]] : "Root");
                        //else printf("%s: %f\n",names[index], predictions[index]);
                        printf("%5.2f%%: %s\n", predictions[index]*100, names[index]);
                }

                getrusage(RUSAGE_SELF, &usage);
                endu = usage.ru_utime;
                ends = usage.ru_stime;
                printf("user CPU start: %lu.%06u; end: %lu.%06u\n", startu.tv_sec, startu.tv_usec, endu.tv_sec, endu.tv_usec);
                printf("kernel CPU start: %lu.%06u; end: %lu.%06u\n", starts.tv_sec, starts.tv_usec, ends.tv_sec, ends.tv_usec);
                printf("Max: %ld  kilobytes\n", usage.ru_maxrss);
                fprintf(output_file, "user CPU start: %lu.%06u; end: %lu.%06u\n", startu.tv_sec, startu.tv_usec, endu.tv_sec, endu.tv_usec);
                fprintf(output_file, "kernel CPU start: %lu.%06u; end: %lu.%06u\n", starts.tv_sec, starts.tv_usec, ends.tv_sec, ends.tv_usec);
                fprintf(output_file, "Max: %ld  kilobytes\n", usage.ru_maxrss);
                getMemory(output_file);

                fclose(output_file);

                free(net_output_back);

                if(r.data != im.data) free_image(r);
                free_image(im);
                if (filename) break;
        }
}

The output is as expected in this case:

# darknetp classifier predict -pp_start 4 -pp_end 10 cfg/mnist.dataset cfg/mnist
_lenet.cfg models/mnist/mnist_lenet.weights data/mnist/images/t_00007_c3.png 
Prepare session with the TA
Begin darknet
    Executing with NO DEBUG statements
layer     filters    size              input                output
    0 conv      6  5 x 5 / 1    28 x  28 x   3   ->    28 x  28 x   6  0.001 BFLOPs
    1 max          2 x 2 / 2    28 x  28 x   6   ->    14 x  14 x   6
    2 conv      6  5 x 5 / 1    14 x  14 x   6   ->    14 x  14 x   6  0.000 BFLOPs
    3 max          2 x 2 / 2    14 x  14 x   6   ->     7 x   7 x   6
    4 connected_TA                          294  ->   120
    5 dropout_TA    p = 0.80                120  ->   120
    6 connected_TA                          120  ->    84
    7 dropout_TA    p = 0.80                 84  ->    84
    8 connected_TA                           84  ->    10
    9 softmax_TA                                       10
   10 cost_TA                                          10
workspace_size=235200
Loading weights from models/mnist/mnist_lenet.weights...Done!
output file: /media/results/predict_mnist_lenet_pps4_ppe10.txt
data/mnist/images/t_00007_c3.png: Predicted in 0.040399 seconds.
100.00%: 3
 0.00%: 1
 0.00%: 2
 0.00%: 0
 0.00%: 4
user CPU start: 3.104287; end: 3.104287
kernel CPU start: 6.911425; end: 6.916884
Max: 2432  kilobytes
vmsize:281470681746792; vmrss:187647121164672; vmdata:281470681744252; vmstk:132; vmexe:408; vmlib:281470681745316

I am closing the issue.

mofanv commented 1 year ago

Thanks for your updates about the bug!

mackskaren commented 1 year ago

what is your memory configuration for qemu and darknetz? i want to be able to run darknetz with having to decrease TA_DATA_SIZE in the corresponding header file. Did you by chance change anything in the /optee_os/core/arch/arm/plat-vexpress/conf.mk file?