shapehq / tartelet

⚙️💻 A macOS app that makes it a breeze to manage multiple GitHub Actions runners in ephemeral virtual machines on a single host machine. The benefits are that runners can run in parallel, and each job runs in an isolated environment.
MIT License
500 stars 15 forks source link

Random Runner Crashes After Upgrading to 0.8 #76

Open stmitt opened 2 months ago

stmitt commented 2 months ago

We recently upgraded to Tartelet 0.8, and since then, we've been experiencing random crashes of our runners. This issue is impacting our CI/CD workflow, and we're trying to identify the cause.

We are now using Tartelet version 0.8.3, tart version 2.9.0, and are using the xcode 15.3 image from cirruslabs and the host is running sonoma 14.4.1.

This is how the VM's look when connection to the host:

Screenshot 2024-04-17 at 12 57 20

We are unsure whether this problem is related to Tartelet itself, the underlying Tart tool, the VM image, or our GitHub Actions job. Has anyone else experienced similar issues after upgrading? Any insights or suggestions would be greatly appreciated.

Thank you for your assistance!

simonbs commented 2 months ago

Thanks for opening the issue.

Unfortunately, I do not have much to contribute except mentioning that we have three Apple M1 Mac minis running macOS Sonoma 14.3.1, and each of these machines is running two virtual machines with macOS Sonoma 14.1.2, and we have not seen these crashes.

If anyone is seeing the same issue and can provide some more information, then I'm happy to look into whether there's something that can be improved in Tartelet to address it.

greg-cook commented 1 month ago

We're also experiencing this, although have only recently switched to tartelet over orchard now that cirrus images are working.

Next time it happens I will collect and post the crash report.

Our workaround for now is going to be automating the termination of the orphaned VMs and restarting tartelet.

kondratk commented 1 month ago

@simonbs I've experienced the mentioned crash- attaching a crashlog below. @greg-cook could you share please how are you automating re-start of Tartelet and VMs?

crashlog.txt

stmitt commented 1 month ago

Today I was able to extract the kernel panic log. Unfortunately, I cannot tell from it what caused the issue. Here is the log if anybody else can tell from it.

https://pastebin.com/Af4i7mcV

@greg-cook How do you detect if a VM is orphaned?

eigenraven commented 10 hours ago

We've been seeing very similar crashes requiring a manual restart on our runners, right now they're all on macOS 14.5 (host and guest) using the Cirrus Xcode 15.4 images as a base. It seems that some of them are kernel panics, and some are loginwindow crashes, both leave the VM running in a zombie state without tartelet's terminal appearing, and tartelet never attempts to restart them. Closing the VM windows lets tartelet continue as usual.