sid-agrawal / OSmosis

1 stars 0 forks source link

Variable Time to Destroy an MO #65

Open astevins opened 1 month ago

astevins commented 1 month ago

The Issue

While running the PD cleanup benchmark tests in the no-reboot configuration (tests run sequentially in the same boot cycle), we noticed two distinct classes of timings:

Image

The effect is most marked on the example with PD & Resource Space deletion depths of 0, since the overall time to complete the operation is the least possible. However, we do see distinct classes in other configurations, for example with a PD deletion depth of 1 and a Resource Space deletion depth of 2:

Image

Investigation

To determine where this variability was coming from, I started benchmarking sub-operations in the root task while executing the greater PD cleanup. I narrowed down the root location over a few iterations:

  1. The deletion of the hold registry.
  2. Deletion of the PD's ADS from its hold registry.
  3. Unattaching the PD's ELF_CODE section.
  4. Deletion of the frames in the MO for the PD's ELF_CODE section.
    • (After unattaching the ELF_CODE, the refcount of the MO is reduced, deleting the MO.)

The exact location is this loop: https://github.com/sid-agrawal/sel4-gpi/blob/5f956027c395e1d56660e08a648c57773fc502af/libsel4gpi/src/mo_obj.c#L169-L183

The results from a nanobenchmark around this loop, while deleting the ELF_CODE MO, entirely account for the variation in the total PD deletion time. (The number of pages deleted is the same each time).

Image

Conclusion

The variation only occurs when the test is run repeatedly in a single boot, and it occurs during vka_free_object. I suspect some case is triggered that causes the untyped split allocator to rearrange splits, or some memory-clearing is triggered, etc.

Todo

  1. Decide whether or not this is an issue to fix.
  2. If it should be fixed, determine what is causing this behaviour.
sid-agrawal commented 1 month ago

I think this level of investigation is fine for now. Thanks for digging in.