Update Docker Images and readme provided as part of confidential-ml example

ajay-fuji commented 2 months ago

Hi,

Docker image provided with examples/confidential-ml/code_model.md example is outdated as per latest code (main branch).

So the steps mentioned in code_model.md to run device on top of arm FVP does not work.

Please provide updated images and update link in document as well.

Thanks!

ajay-fuji commented 2 months ago

In the documentation examples/confidential-ml/code_model.md, in How to test with Islet section, it seems like launching of ARM FVP should come before running three instances on host PC.

Once FVP is started with tap network, then a new network interface is created with 193.168.10.15 IP. After that only certifier-service, runtime and model-provider can be run with given commands. After that commands following terminal 4 can be run.

Command to launch terminal 4 can also be added once FVP is running -

Connect to FVP main terminal - telnet localhost 5000
Connect to FVP RMM terminal - telnet localhost 50003

Please suggest for possible correction.

jinbpark commented 2 months ago

First of all, thank you for trying out ISLET-! you can find answers below.

Docker image provided with examples/confidential-ml/code_model.md example is outdated as per latest code

It's true and I'm aware of that. As you said, it needs to be updated accordingly. I'll check it out and update it. You can use How to test with simulated enclave (no actual hardware TEE) on x86_64 in the meantime (if this no-actual-TEE setup suffices for what you want to do).

Once FVP is started with tap network, then a new network interface is created with 193.168.10.15 IP.

It also seems to have to do with "outdated" example codes and instructinos. I'll check it out as well.

ajay-fuji commented 2 months ago

Thanks @jinbpark for checking this out. We want to run use case with islet on FVP. We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

Any temporary patch for this will be helpful until documentation and docker image is fixed from your end.

Thanks!

jinbpark commented 2 months ago

We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

Could you write down a more detailed context about your changes? It might help.

ajay-fuji commented 2 months ago

Files changed -

certifier-service
     |- run.sh  #-> make HOST as input argument instead of using 0.0.0.0
runtime
     |- init.sh  # Change with latest code from main branch
     |- run.sh # Change with latest code from main branch 
model-provider
     |- init.sh #Change with latest code from main branch
     |- run.sh  # Change with latest code from main branch

Steps to run example -

Start FVP with linux
Start certifier service
Start runtime and model-provider terminals
Login into FVP using telnet localhost 5000
Execute steps as given in FVP step -
- Update set-realm-ip.sh with IP range from virtual network interface created by FVP. Can be checked using ip addr command.
- After this when executing init_aarch64.sh and run_aarch64.sh getting above shared error regarding glibc library.

ajay-fuji commented 2 months ago

Do we have any idea, how to resolve the GLIBC issue?

ajay-fuji commented 2 months ago

Thanks @jinbpark for checking this out. We want to run use case with islet on FVP. We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

Any temporary patch for this will be helpful until documentation and docker image is fixed from your end.

Thanks!

We could not reproduce this exact error. But few things to notice is that,

eth0 is not default network interface in cloud instances, so we were not able to connect to certifier service from Device (realm on FVP) using ./init_aarch64 193.168.10.15 script.
When we created a network interface using ip link add eth0 type veth command, then we were able to connect to certifier service.
Still certification fails error we could see but seems like this is because of realm authenticity issue, not realm to certifier service network issue.

PS: When we start FVP, machine internet goes down. Anyone has experienced this earlier? Any idea how to fix this?

jinbpark commented 1 month ago

PS: When we start FVP, machine internet goes down. Anyone has experienced this earlier? Any idea how to fix this?

Could you try commenting out line-41/line-42 of configure_tap.sh?

Do we have any idea, how to resolve the GLIBC issue?

Sorry about the late response. I don't have enough time to dig into this issue, until the start of this September. I'll do look at this issue after that point (maybe in the middle of this September?). In the meanwhile, you can build the tensorflow lite library on your own if you really need the tensorflow capability.

ajay-fuji commented 1 month ago

Hi @jinbpark,

Thanks for suggesting the solution. Although we were able to find out this solution. Also since that glibc issue is not reproducible, so you can skip that part for now.

Currently we are not able to run ./init_aarch64.sh and ./run_aarch64.sh with below error -

Error for run_aarch64.sh -

# ./run_aarch64.sh 193.168.10.15 8125 code 0 -1 -1 193.168.20.10
Mon Sep 18 00:00:00 UTC 2023
ln: /lib/libtensorflowlite.so: File exists
ln: /lib/libtensorflowlite_flex.so: File exists
running as client
load_client_certs_and_key: can't translate der to X509
init_client_ssl: load_client_certs_and_key failed
Can't init client app

Here also same time-voilation error logs could be seen in certifier-service terminal.

If you help up navigate through this, it would be grateful. Thanks!

islet-project / islet

Update Docker Images and readme provided as part of confidential-ml example #356