While running the PD cleanup benchmark tests in the no-reboot configuration (tests run sequentially in the same boot cycle), we noticed two distinct classes of timings:
The effect is most marked on the example with PD & Resource Space deletion depths of 0, since the overall time to complete the operation is the least possible. However, we do see distinct classes in other configurations, for example with a PD deletion depth of 1 and a Resource Space deletion depth of 2:
Investigation
To determine where this variability was coming from, I started benchmarking sub-operations in the root task while executing the greater PD cleanup. I narrowed down the root location over a few iterations:
The deletion of the hold registry.
Deletion of the PD's ADS from its hold registry.
Unattaching the PD's ELF_CODE section.
Deletion of the frames in the MO for the PD's ELF_CODE section.
(After unattaching the ELF_CODE, the refcount of the MO is reduced, deleting the MO.)
The results from a nanobenchmark around this loop, while deleting the ELF_CODE MO, entirely account for the variation in the total PD deletion time. (The number of pages deleted is the same each time).
Conclusion
The variation only occurs when the test is run repeatedly in a single boot, and it occurs during vka_free_object. I suspect some case is triggered that causes the untyped split allocator to rearrange splits, or some memory-clearing is triggered, etc.
Todo
Decide whether or not this is an issue to fix.
If it should be fixed, determine what is causing this behaviour.
The Issue
While running the PD cleanup benchmark tests in the no-reboot configuration (tests run sequentially in the same boot cycle), we noticed two distinct classes of timings:
The effect is most marked on the example with PD & Resource Space deletion depths of 0, since the overall time to complete the operation is the least possible. However, we do see distinct classes in other configurations, for example with a PD deletion depth of 1 and a Resource Space deletion depth of 2:
Investigation
To determine where this variability was coming from, I started benchmarking sub-operations in the root task while executing the greater PD cleanup. I narrowed down the root location over a few iterations:
The exact location is this loop: https://github.com/sid-agrawal/sel4-gpi/blob/5f956027c395e1d56660e08a648c57773fc502af/libsel4gpi/src/mo_obj.c#L169-L183
The results from a nanobenchmark around this loop, while deleting the ELF_CODE MO, entirely account for the variation in the total PD deletion time. (The number of pages deleted is the same each time).
Conclusion
The variation only occurs when the test is run repeatedly in a single boot, and it occurs during
vka_free_object
. I suspect some case is triggered that causes the untyped split allocator to rearrange splits, or some memory-clearing is triggered, etc.Todo