Open askfongjojo opened 8 months ago
I've been thinking about this a bit in the background, and wanted to collect those ideas somewhere, even if half-baked.
Creating a zone bundle requires collecting two kinds of information: the output of a few commands, and log files. The former require the RunningZone
object used throughout the sled-agent to represent a live zone. That's so that (1) we can run commands inside that zone and (2) we ensure that the zone isn't removed while we're bundling. The log files don't strictly require the zone -- we need a few pieces of information from it to find the files, but then we create ZFS snapshots from the filesystems containing the log files and copy the files from there. Once we've created that snapshot, the RunningZone
is no longer required.
That's important, because gathering the log files takes the vast majority of the time needed to make the bundle. While I was developing this, I generally found taht the commands would all be completed within a few seconds. The log-file collection operates on a few (or few tens) of files per second, and so would often take much longer than running the commands. It's also less bounded, since the duration depends on the uptime of the zone, since more logs are produced and files rotated the longer the zone is alive.
All this is to say that we could decouple running the commands from collecting the log files. We might hold a reference to the RunningZone
for the first part, and then drop it after we create the snapshot, but before starting to actually copy the files.
5235 uncovered a potential need for zone bundle processing performance improvement. The failure mode may be more common than we think when user spins up a large number of long-running worker instances and spins them down en masse, resulting in concurrent zone-bundle requests on propolis zones that all have a large number of propolis log files to be tar-ed up.
Here are the relevant comments from the customer ticket that provide more context to the possible solutions to this issue:
@gjcolombo
@bnaecker