mlswg / mls-implementations

Coordination of implementation and interop specific details
110 stars 14 forks source link

Add a Free RPC and clean up after a script run #163

Closed bifurcation closed 1 year ago

bifurcation commented 1 year ago

Currently, the script runner tells clients under test to allocate resources (CreateKeyPackage, CreateGroup, etc.), but never tells them when those resources are no longer needed. This PR adds a Free() RPC to let the client know that the resources for the test are no longer needed, and calls that RPC at the end of a script run. Only the final state of each client is cleaned up, under the theory that (a) clients can clean up prior states as they change epochs and (b) join resources can be cleaned up once the member joins the group.

bifurcation commented 1 year ago

I have implemented this in MLSpp (https://github.com/cisco/mlspp/pull/343) and it appears to at least not break anything. I'm not seeing memory usage as flat as I would expect, but I expect that's just that the MLSpp interop harness isn't GC'ing as aggressively as it should.

mulmarta commented 1 year ago
  1. Some RPCs like CreateKeyPackage produce a transaction_id, not state_id. Are transaction_id's associated with some data that's not cleared?
  2. Would it be simpler to have a Wipe RPC that essentially "reboots" the server? May be easier to implement without leaks?
bifurcation commented 1 year ago
  1. The transaction_id RPCs are all "prepare for join" things, which should be followed by JoinGroup. So I was thinking that the client could free any of those resources when it gets the JoinGroup.

  2. Wipe interacts poorly with concurrency. Currently, you could have multiple test runners talking to the same client at the same time, and modulo collisions of uint32s, they wouldn't conflict with each other. Wipe would violate that.

mulmarta commented 1 year ago
  1. Totally makes sense. 1. This means that branch should delete artifacts related to the old state id?
bifurcation commented 1 year ago

Good question! Right now, the test runner only keeps a single state ID per client, so once it has branched, it can't refer to the old state. For now, a client would be safe to delete artifacts associated to the old state ID.

But at some point it would be nice to validate that both branches work. At that point, the test runner would have to Free both branches, and a client that proactively deleted artifacts on branch would break.

I'll update the PR so that ActionBranch calls Free() on the state IDs for the old branch. That cleans up the state, without the need for clients to do it proactively.