Closed bxatnarf closed 5 years ago
There seem to be two bugs that cause this behavior:
pthread_join
is invoked, it eventually calls munmap
. This causes popcorn to send a PCN_KMSG_TYPE_VMA_OP_REQUEST
message to the other host and block until it receives a response.pcn_kmsg_done
is called with the return value instead of a message pointer. pcn_kmsg_done
assumes its argument is a struct pcn_kmsg_hdr
pointer and subsequently tries to kfree()
it, resulting in a crash2. When it finally times out, `pcn_kmsg_done` is called with the return value instead of a message pointer. `pcn_kmsg_done` assumes its argument is a `struct pcn_kmsg_hdr` pointer and subsequently tries to `kfree()` it, resulting in a crash
There are other instances of improper use of
pcn_kmsg_done
, we should hunt these all down and fix them - see issue https://github.com/ssrg-vt/popcorn-kernel/issues/66
mt will run for a couple of minutes then fail (get killed) when executing on x86.
The space below is notes I've taken to help me pinpoint the issue.
The source code for mt can be found here, it looks like:
This is the log on the origin. It is possible that it doesn't capture the earliest failure.
It is tricky to determine which parts of the target host's log correspond with the origin's as it doesn't indicate any errors, but here is an example of the target's log at about the same time until the end of execution.
mt
spawns a number of threads and migrates each back and forth 100 times. However, the bug it seems to consistently appear after the last migration after migrating 100 times (i.e., the end ofmt
's execution). This bug appears even when the program only loops once, although the kernel output is different in that only the NULL pointer dereference error appears.I.e., here is example output from the origin kernel of
mt
only looping onceHere is the log on the origin after only running loop once
With LOOP=1,
main()
returnswait_station
until it times out, and -ETIMEOUT is returned