Open L0czek opened 1 month ago
Before reproducing the problem with the provisioning example, I checked if it still has problem with large size files in a simplied setup using virtio-9p in a linux realm.
File sizes -rw-r--r-- 1 16M linux.realm -rw-r--r-- 1 16M linux.realm2 (copy of linux.realm) rwxr-xr-x 1 4.5M sdk-example
execution in a linux realm.
# diff linux.realm linux.realm2 // read large size files and compared the content successfully
# ./sdk-example // executed a large size execution binary successfully
Getting an attestation report on aarch64.
Failed to get an attestation report. ENOENT
Attestation result Err(Report)
Sealing result Ok(())
Most of the memory operations are involved during the realm launch and virtio usages. So, I don't suspect this as a memory issue but will check by reproducing the problem with the app provisioning example.
@bokdeuk-jeong , I recently came across a similar issue where it works with tf-rmm but does not with islet. Here is how to reproduce it- (I believe it is an easier way to reproduce a similar issue)
cp -f /shared/debian12.img /
-d /debian12.img
with -d /shared/debian12.img
. (it forces it to load debian12.img from 9p)root@islet:~#
@jinbpark
Unfortunately(?), islet and tf-rmm show the same results with the debian12.img. They both presents the message :
[ 53.506242] I/O error, dev vda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
[ 53.578298] Buffer I/O error on dev vda, logical block 0, lost sync page write
[ 53.900748] EXT4-fs (vda): I/O error while writing superblock
[ 53.952137] systemd[1]: proc-sys-fs-binfmt_misc.mount: Failed with result 'exit-code'.
and
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Press Enter for maintenance
(or press Control-D to continue):
root@islet: // Bokdeuk: login prompts here
While running the provisioning setup, we started encountering a truly random errors from various layers of the stack that could not be reproduced in tf-rmm. In detail, we encountered the following errors:
Since we couldn't reproduce them while running tf-rmm there might be some memory issue introduced by islet rmm. We first encountered this issue when we tried to test the application installation using a statically linked rust "hello world" application. It weights about 1MB, due to static linking and debug build. Every time we tried to install it while running islet, the installation process would fail with TLS decrypt error. We then switch to tf-rmm to check if the issue still persisted. To our surprise, it did not. In the next step, we created a lighter "hello world" application that weight only 1KB and managed to get it installed under Islet. This let us conclude that Islet introduces some memory instability that leads to arbitrary corruption when large amounts of data are processed.
Testcases
The big application
example_app
This application consists of a simple loop that prints "I'm alive" with a number and then increments it.
After compiling and packaging the OCI container with this application weights about 1.2MB.
Using:
create-application -n example_app -v latest -i image-registry.net:1337 -o 32 -d 32 -r 5156ae05-1da0-4e7b-a168-ec8d1869890e
Islet rmm
On islet this application failes everytime mostly with some TLS decryption error. Although, sometimes we observed an hash mismatch error suggesting that the TLS layer worked fine but the corruption happend during OCI container image validation.
tf-rmm
On tf-rmm this application works everytime.
The light application
light_app
This application acts similarly but was made to be as small as possible. For this reason, it is written in C and utilizes syscalls directly.
After compiling and packaging, the OCI container with this application weights about 13KB.
Using:
create-application -n light_app -v latest -i image-registry.net:1337 -o 32 -d 32 -r 5156ae05-1da0-4e7b-a168-ec8d1869890e
Islet rmm
Thanks to the lowered application size we were able to launch the
light_app
:Unfortunately, lowering the container size only made error less frequent. We are still noticing error similar to this:
tf-rmm
On tf-rmm everything works just like with the bigger application.
Reproduction
To reproduce this issue you can follow the application provisioning instruction.
Naturally, the realm image was compiled once to ensure that the build process doesn't interfere with the results. I only changed between islet and tf-rmm using the commandline:
./scripts/fvp-cca --normal-world=linux-net --realm=linux --rmm=tf-rmm --hes --rmm-log-level info
for tf-rmm,./scripts/fvp-cca --normal-world=linux-net --realm=linux --rmm=islet --hes --rmm-log-level info
for islet.The rest is exaclty as spcecified in the instruction. The building of test applications used in this issue is also explained in the instruction.