Open stolowski opened 4 years ago
@stolowski I'll be looking to reproduce this behavior today and hopefully identify and resolve the issue fairly quickly.
@stolowski I'll be looking to reproduce this behavior today and hopefully identify and resolve the issue fairly quickly.
Thank you! To be honest, I'm a little skeptical when it comes to reproducing it in a short time; it just happened here again after a few days, and after a few hours of working today without experiencing the problem. I'm attaching vagrant debug in case it shows anything that may give a clue...
If you have any suggestions where to look for more logs/debug, please let me know.
This just happened again, so 5 days from previous occurence. After restarting the system I took a look at the processes with pstree and under normal conditions I see this:
|-+= 00318 root /opt/vagrant-vmware-desktop/bin/vagrant-vmware-utility api -port=9922 -config-file=/opt/vagrant-vmware-desktop/config/service.hcl
| \--- 02980 root /Applications/VMware Fusion.app/Contents/Library/vmrest
which means vagrant-vmware-utility is in control of vmrest. Is there a service related to vagrant-vmware-utility that I could restart as a workaround when this happens?
@chrisroberts I had a hunch and checked Mac's Console app and found two crash reports for vagrant-vmware-utility that match the times where vmrest wouldn't work anymore; attached below.
This just happened again. I was hoping sudo launchctl start com.vagrant.vagrant-vmware-utility
would work around the problem, but it didn't help (the service would start, but vmrest does not respond). @chrisroberts Is there any workaround you can think of that I could try next time? Any logs or other information I should get when it happens again? This is a very frustrating problem...
I know recent update to vagrant/vagrant-vmware-utility was aimed at fixing the issue of zombie vmrest process and I think it indeed fixed it, but I just experienced the problem again - this time there there was no vmrest zombie process, vagrant-vmware-utility was running, but it didn't not (re)start vmrest, so I was getting the usual "Get http://localhost:49743/api/vmnet: GET http://localhost:49743/api/vmnet giving up after 5 attempts" error from vagrant commands.
This time however I was able to recover with:
$ sudo launchctl stop com.vagrant.vagrant-vmware-utility
$ sudo launchctl start com.vagrant.vagrant-vmware-utility
which also started vmrest.
So I think this is going in the right direction but it would be great if com.vagrant.vagrant-vmware-utility service automatically restarted vmrest if needed.
@stolowski had you put the machine to sleep between the time it was functioning correctly, and when it began to error with the vmrest process not being started?
Also, if you could toss in the logs from /Library/Application Support/vagrant-vmware-utility/service.log
around when you experience this, that would be great. I'll work on trying to repro this again with the latest utility.
Thanks!
@chrisroberts No, I didn't put the machine to sleep around the time the issue occurred. I woke it in the morning around 9:00, started a vm with vagrant up
followed by vagrant ssh
, then worked on it over ssh for a couple of hours (without issuing any vagrant commands for around 4 hours which seems to be confirmed by the log). The problem occurred around 13:09 when I tried to issue a vagrant command.
Attaching the log for that day, I think you should be looking for 2020/11/13 13:09. service.log.gz
(there are a couple of vagrant-vmware-utility restarts after that time which are my attempts to recover and some playing with automating the recovery).
Ah, I've been running into this for quite some time now, so it's helpful to see what the cause is. I'd gotten so far as figuring out that the vmrest was in zombie state (a couple months ago), but assumed it was a VMware bug and didn't have any more time to spend tracking it down.
This time, when it failed, I was able to restore vagrant vmware rest api service by restarting com.vagrant.vagrant-vmware-utility
with launchctl
-- thanks for that!
If there's any additional debug info you might need, I could poke around as well.
FWTW I haven't experienced this problem with vagrant 2.2.14 for good 3 weeks, but I've also upgraded to Big Sur on the day it was officially released (on a side note, I don't recommend switching to Big Sur without doing some research/reading first, i.e. there are serious issues with nested virtualization with Fusion 12 on it). All my previous problems related to that issue were from Catalina.
UPDATE: oh well, I spoke too soon, I just hit it again shortly after waking my MBP in the morning.... The workaround with restarting the service worked.
This has been happening to me and those on my team for years, at least as far back as this issue report. This happens so often that we have a shell script that runs:
sudo launchctl unload -w /Library/LaunchDaemons/com.vagrant.vagrant-vmware-utility.plist
sudo launchctl load -w /Library/LaunchDaemons/com.vagrant.vagrant-vmware-utility.plist
This has been happening to me and those on my team for years, at least as far back as this issue report. This happens so often that we have a shell script that runs:
Yep, me too. It's just the normal part of the routine for me. Eventually, routine will transition into tradition. It'll feel good and right and true, but we'll have long since forgotten why we're doing it!! 😁
Vagrant version
2.2.10
VMware Fusion 12
Host operating system
Mac OS 10.15.7 (19H2)
Guest operating system
Linux, Windows 10.
Vagrantfile
N/A
Debug output
N/A
Expected behavior
vagrant commands (up, ssh ...) work all the time.
Actual behavior
Every few days vagrant commands stop working and error out accessing vmware REST API. I can confirm vmrest API is not available by connecting with a web browser at the given http port (when it works, then I can browse these APIs with a web browser).
When this happens, vmrest process appears to be a zombie:
Normally (when it works, it looks like this:
Steps to reproduce
No clear steps. Happens randomly every few days, I keep my Mac running for several days and only close the lid overnight. I don't think it is strictly related to waking it from sleep in the morning, it also happened close to end of my work day.
References
I've created a VMware support ticket as well, but it is at early stages yet.
I suspect the problem might be with VMware/vmrest; on the other hand it seems that vmrest is started on demand (triggered / spawned by vagrant commands), and if vmrest becomes a zombie, it may suggest it's parent process quit abruptly. I'd appreciate any thoughts or ideas on debugging this, or any suggestions about how to recover (e.g. if restarting any daemons would help), as at the moment, in order to recover and have vagrant operational again I have to restart my system whenever this happens. I think this issue started happening after upgrading to Fusion 12 (and upgrading vagrant for it).