ssrg-vt / hermitux

A binary-compatible unikernel
https://ssrg-vt.github.io/hermitux/
391 stars 28 forks source link

Some Bots benchmarks are not working #28

Open p-jacquot opened 3 years ago

p-jacquot commented 3 years ago

I noticed that there are Bots benchmarks that are not working with Hermitux. The following programs are executed on a nova node of g5k. I have no difficulty running other benchs on this node. Here they are :

strassen.omp-tasks

I think the latest bug fixes in Hermitux may have created bugs for this bench. Before the bug fixes, I was able to execute strassen with a n parameter equals to the value of 1024. But now, even with this value I'm not able to run it. Here is the error shown by the program :

0x000000000020d803
/root/hermitux/hermitux-kernel/libkern/string.c:37 (discriminator 3)

Here is the last lines of the kernel logs :

[0.070][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x20d803, fs = 0x4d6508, gs = 0, rflags 0x11086, task = 1, addr = 0xffffff9f92746000, error = 0x2 [ supervisor data write not present ]
[0.070][0:1][ERROR] rax 0xffffff9f92746000, rbx 0x8000000000, rcx 0xffffff9f92746000, rdx 0xffffff9f92747000, rbp 0xffffffffcfc93a30, rsp 0xa2e108 rdi 0xffffff9f92746000, rsi 0, r8 0x1, r9 0x3f24e8c5a001, r10 0x22, r11 0x1246, r12 0xffffff800fc93a30, r13 0x1, r14 0x8, r15 0x1f92747
[0.070][0:1][ERROR] Heap 0x4db000 - 0x4de000

alignment.for-omp-tasks

This one never worked with Hermitux. Here is the error shown :

Sequence format is Pearson
Multiple Pairwise Alignment (20 sequences)
0x0000000000485901
??:?

Here are the last lines of the kernel logs :

[0.000][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x485901, fs = 0x4dadc8, gs = 0, rflags 0x11206, task = 1, addr = 0x9b4b68, error = 0x2 [ supervisor data write not present ]
[0.000][0:1][ERROR] rax 0xa16abc, rbx 0xa1b910, rcx 0x4, rdx 0x4, rbp 0xa16af0, rsp 0x9b4b60 rdi 0x15, rsi 0x28, r8 0, r9 0x956, r10 0x2, r11 0x9, r12 0x1, r13 0x4, r14 0xa, r15 0x672
[0.000][0:1][ERROR] Heap 0x4e0000 - 0x4e2000

uts.omp-tasks

I started executing this bench only a few days ago, and I noticed I doesn't work too. This one is known for using a high number of tasks, I don't know if it is helpful. It can be important, because the crash seems to be located in OpenMP's functions.

The following is obtained for an execution with OMP_NUM_THREADS=1 and HERMIT_CPUS=1

uts :

Root branching factor                = 2000.000000
Root seed (0 <= 2^31)                = 23
Probability of non-leaf node         = 0.333344
Number of children for non-leaf node = 3
E(n)                                 = 1.000032
E(s)                                 = -31250.000000
Compute granularity                  = 1
Random number generator              = SHA-1 (state size = 20B)
Root node at 0xa33710
GUEST PAGE FAULT @0x9b3ff8 (RIP @0x4781de)
0x00000000004781de
kmp_tasking.cpp:?

Here are the last lines of the kernel logs :

[0.060][0:1][ERROR] Page Fault Exception (14) on core 0 at cs:ip = 0x8:0x4783c9, fs = 0x4d6628, gs = 0, rflags 0x11246, task = 1, addr = 0x9affd8, error = 0x2 [ supervisor data write not present ]
[0.060][0:1][ERROR] rax 0x485cc0, rbx 0x75d693d0a4c0, rcx 0x21a, rdx 0, rbp 0x9b00b0, rsp 0x9affe0 rdi 0, rsi 0x75d693d0a4c0, r8 0x188, r9 0xa32e0, r10 0x75d693d0a300, r11 0xc50ff8ec, r12 0x4d1a80, r13 0x75d693d0a380, r14 0, r15 0x9b0220
[0.060][0:1][ERROR] Heap 0x4db000 - 0x4de000

Providing the benchs

I'd like to give you the executable that are crashing, so that you can reproduce the errors, how should I proceed ?

olivierpierre commented 3 years ago

Thanks for reporting these bugs, could you create folders with self-contained version of the benchmarks following the models present here to have an easy way of reproducing the issues?

olivierpierre commented 3 years ago

Could you double check alignment_for? it's working fine on my computer, maybe it has been solved by one of the latest fixes?

p-jacquot commented 3 years ago

Strange, I've checked on my personal laptop just before making the pull request and it wasn't working. I'll try to dig a bit to see if it was an error of mine or not.

olivierpierre commented 3 years ago

After testing on a Grid5000 machine, I can indeed see the issue:

polivier@nova-18:~/hermitux/apps/bots/alignment/alignment_for$ make test
OMP_NUM_THREADS=4 \
HERMIT_VERBOSE=0 HERMIT_ISLE=uhyve HERMIT_TUX=1 \
HERMIT_DEBUG=0 HERMIT_SECCOMP=0 HERMIT_MEM=4G \
HERMIT_CPUS=4 /home/polivier/hermitux/hermitux-kernel/prefix/bin/proxy /home/polivier/hermitux/hermitux-kernel/prefix/x86_64-hermit/extra/tests/hermitux \
prog  -f ../prot.20.aa
Sequence format is Pearson
Multiple Pairwise Alignment (20 sequences)
GUEST PAGE FAULT @0x99fc00 (RIP @0x4024e1)
0x00000000004024e1
/home/polivier/hermitux/apps/bots/alignment/alignment_for/alignment.c:273