[RFC] runtime environment lineage metadata

At SPI, when spawn is launching an application and creates an spfs runtime, it attempts to find the least-nested spawn process to use to fork a new process. In effect it is trying to clone from a process that has not been changed by whatever spfs activation scripts that may have executed in the current runtime. However, it is fairly common for a spawn-launched application to exec such that the spawn process is gone and can't be used to fork additional processes later. In this case, the nested spfs runtime inherits the environment from the current process rather than some parent.

As we are planning to add spawn-like functionality to spfs/spk, we'll need something in spfs to approximate the behavior of spawn, and there's an opportunity to redesign the mechanism to address the shortcomings of how it works in spawn.

The first part of the proposal is to add environment capturing into an spfs runtime when a runtime is created. This would likely use the new layer annotation mechanism to store the data. The goal is to save the state of all the environment variables at the point when an spfs runtime is created, before any changes are applied to the environment.

The second part is tracking the lineage of nested runtimes, e.g., using spfs run ... from inside an spfs runtime. In addition to capturing the environment, a reference to the parent runtime's environment will also be captured.

The net effect is to make it possible that, given any runtime id, it would be possible to look up the environment that created that runtime, or find its parent environment, recursively, all the way up to the original "pristine" environment. This mimics how spawn walks the process tree to find the outermost parent spawn process, but it doesn't depend on those processes still existing.

When launching a child runtime, it would be possible to inherit the environment variables from the "pristine" environment, whether or not that process still exists. It would also be possible to make this behavior optional. There are some situations where it is desirable to run a command that is spawn wrapped but still retain the environment variables set in the current process. Currently with spawn, it is unpredictable if this will happen or not.

From the meeting today:

are there security concerns with making all runtime environments accessible via spfs commands?
- could we encrypt this data and control access only to child environments? Is that secure enough?
- should probably maintain the same restrictions as the /proc/<pid>/environ
- possibly easier to allow spfs to set and respect file permissions in the repo, to enable this kind of ACL where the data is limited access to its related runtime processes.
We would need to also use a platform+layer approach to create a tagged chain of these if we want to maintain traceability to parent environments
Alternatively, the monitor could hold this information in memory and be willing to provide it to the processes within the appropriate namespace - via an appropriately accessible socket or something like that

spkenv / spk

[RFC] runtime environment lineage metadata #1116