sandia-minimega / minimega

minimega
GNU General Public License v3.0
148 stars 67 forks source link

[minimega] Retry (with backoff) finding disk partitions for injects #1490

Closed activeshadow closed 1 year ago

activeshadow commented 1 year ago

Lately we've been seeing a lot of desired partition <n> not found errors when injecting files into VM disk images. Adding this retry with backoff has proven to help reduce how often we see this issue (basically we don't see it anymore at all).

jacdavi commented 1 year ago

This seems to work as expected, but I'm only testing without the error. Is there a way to get the desired partition <n> not found errors?

activeshadow commented 1 year ago

@jacdavi you can force the error by specifying a non-existent partition when injecting files, but that won't really test the retry. We were seeing this error a lot, even when the partition existed, when we were deploying a very large number of VMs. It was as though the code was connecting and mounting a large number of disk images very fast, and disconnecting them wasn't happening fast enough before the next one was attempted to be connected.

I get why you're trying to test, but I can confirm this code still works... we've had no trouble with it since we've started using it back in late January of this year.

jacdavi commented 1 year ago

@activeshadow thanks for the additional context. Code itself looks good to me, and since you've had multiple people using it without issue, I'm good to merge.