Variable Time to Destroy an MO

The Issue

While running the PD cleanup benchmark tests in the no-reboot configuration (tests run sequentially in the same boot cycle), we noticed two distinct classes of timings:

The effect is most marked on the example with PD & Resource Space deletion depths of 0, since the overall time to complete the operation is the least possible. However, we do see distinct classes in other configurations, for example with a PD deletion depth of 1 and a Resource Space deletion depth of 2:

Investigation

To determine where this variability was coming from, I started benchmarking sub-operations in the root task while executing the greater PD cleanup. I narrowed down the root location over a few iterations:

The deletion of the hold registry.
Deletion of the PD's ADS from its hold registry.
Unattaching the PD's ELF_CODE section.
Deletion of the frames in the MO for the PD's ELF_CODE section.
- (After unattaching the ELF_CODE, the refcount of the MO is reduced, deleting the MO.)

The exact location is this loop: https://github.com/sid-agrawal/sel4-gpi/blob/5f956027c395e1d56660e08a648c57773fc502af/libsel4gpi/src/mo_obj.c#L169-L183

The results from a nanobenchmark around this loop, while deleting the ELF_CODE MO, entirely account for the variation in the total PD deletion time. (The number of pages deleted is the same each time).

Conclusion

The variation only occurs when the test is run repeatedly in a single boot, and it occurs during vka_free_object. I suspect some case is triggered that causes the untyped split allocator to rearrange splits, or some memory-clearing is triggered, etc.

Todo

Decide whether or not this is an issue to fix.
If it should be fixed, determine what is causing this behaviour.

sid-agrawal / OSmosis