mutagen-io / mutagen

Fast file synchronization and network forwarding for remote development
https://mutagen.io
Other
3.43k stars 152 forks source link

Add daemon watchdog to log daemon output (and potentially restart on failure) #70

Open IngCr3at1on opened 5 years ago

IngCr3at1on commented 5 years ago

Trying to debug a crash on a debian squeeze vm but apparently I can't find the daemon logs. Any advice would be awesome 😄

xenoscopic commented 5 years ago

@IngCr3at1on If it's a crash in the daemon itself, then try running the daemon interactively via the hidden command mutagen daemon run (you'll need to stop any existing backgrounded instance with mutagen daemon stop). Then when the daemon crashes, it should dump an error or stack trace. If you do manage to get a stack trace of the crash, please forward it over and I'll have a look. Thanks!

IngCr3at1on commented 5 years ago

I left this running and was able to see the complete crash dump (in all it's massive glory) but thanks to my infinite wisdom I had it being logged out of the GCP terminal for the node rather than a standard terminal and it's apparently stupid about copy/paste... will leave it running again with the data piped to a file.

non-related: the travis relative file paths in the stack traces are funny :joy:

IngCr3at1on commented 5 years ago

@havoc-io so it looks like the node it's running on is actually running out of memory :joy: iirc I shouldn't be that surprised (though I am a little bit)...

We can basically close this issue as it's related to me attempting to do something I shouldn't be doing anyway apparently :joy: (was hoping to be super lazy in keeping my go source current across several machines but I can't do it with this node that's for sure lol).

It might be worth writing some data to a log file somewhere automatically to make this type of issue more immediately discoverable?

xenoscopic commented 5 years ago

Interesting... Just out of curiosity, what sort of resources does the machine have? It'd be interesting to know the lower bound for system requirements. The Go 1.12 runtime may substantially lower Mutagen's memory usage. I'll move Mutagen over to Go 1.12 in v0.9.0.

As far as logging... that might be tough. Out-of-memory termination is usually performed via an uncatchable SIGKILL signal. I'm thinking maybe this could be done via a watchdog parent process for the daemon that catches its output on termination and writes it to a log (and maybe restarts it). That wouldn't be too hard to implement. I'll add this to the v0.9.0 milestone, but I may punt if it's more difficult than expected.

Thanks for your persistence in debugging this!

IngCr3at1on commented 5 years ago

@havoc-io yeah I know it would be particularly funky to attempt to selectively log that, was just a thought...

Funny enough it's a GCP f1-micro (1 vCPU, 0.6 GB memory) lol... it worked fine for syncing my dotfiles and was like "meh I'll give it a shot" but so interestingly my above comment might not be 100% accurate anyway...

I noticed that even after it crashed there was still a large amount of memory consumption, turns out there was a mutagen process still running even after I thought everything was stopped. I killed this process and have left the daemon in run mode again and have top running in a separate terminal, at the moment swap is doing it's job properly... Will update eventually :smile:

IngCr3at1on commented 5 years ago

so yeah... left running after a clean restart and it's definitely running out of memory; here is the full stack if you are curious anyway... https://pastebin.com/4W59fFYg

xenoscopic commented 5 years ago

@IngCr3at1on Thanks for the stack trace. I will definitely look into adding the watchdog process to catch these sorts of issues.