Can't start daemon when launched automatically, but can when launched manually

cumber commented 6 years ago

I'm running Xmonad on Nixos, so my system's a little weird. This is probably something with my environment, but I can't figure out what.

I have syncthing-gtk started from my xmonad startup hook. It starts up, and the UI remains responsive, but it just fills the log up with repeated messages like:

E Daemon        g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
Traceback (most recent call last):
  File "/nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/lib/python2.7/site-packages/syncthing_gtk/daemon.py", line 1051, in _connected
    self._connection = self.connect_to_service_finish(results)
Error: g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
E Daemon        g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
Traceback (most recent call last):
  File "/nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/lib/python2.7/site-packages/syncthing_gtk/daemon.py", line 1051, in _connected
    self._connection = self.connect_to_service_finish(results)
Error: g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
...

Adding a new repetition every second or so. Nothing changes after waiting even several minutes.

If I quit the syncthing-gtk that Xmonad launched and start it manually, everything works fine; it connects to the daemon within a few attempts.

If I quit my desktop session, the daemon process doesn't get killed; on logging back in the syncthing-gtk started by Xmonad does connect to the existing daemon process and work. However if I kill the syncthing daemon the Xmonad-started syncthing-gtk fails to relaunch the daemon, and emits those "Connection refused" error messages into the log every second or so. (Killing the daemon out from under a syncthing-gtk I've started manually results in it starting another daemon process and all is well)

I thought it might just be a timing problem, where some session initialization hasn't happened yet by the time Xmonad is automatically launching it, so I put in delays, but even delays of 30s doesn't make a difference (and I'm able to launch it manually within 30 seconds of startup and it consistently works).

I thought there might be a difference in environment variables the two launch methods inherit, but saving and diffing the /proc/<pid>/environ between the working and non-working cases only revealed that when I launched it manually there were 3 additional variables in the working case:

+GIO_LAUNCHED_DESKTOP_FILE=/home/cumber/.nix-profile/share/applications/syncthing-gtk.desktop
+GIO_LAUNCHED_DESKTOP_FILE_PID=12190
+IBUS_NO_SNOOPER_APPS=synapse

Otherwise the two environment files were identical.

Running with --debug and --dump, I get this on stderr when it's not working:

D UIBuilder     Enabled: header_bar
D UIBuilder     Enabled: icons_in_menu
D UIBuilder     Loading glade file /nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/share/syncthing-gtk/app.glade
D UIBuilder     Enabled conditions: set([u'icons_in_menu', u'header_bar'])
D UIBuilder     Removed 'property' by attribute: is_windows
D UIBuilder     Removed 'property' by attribute: is_windows
D UIBuilder     Allowed node icons_in_menu | old_gtk
D UIBuilder     Removed 'property' by attribute: is_windows
D UIBuilder     Allowed node !is_gnome | old_gtk
D UIBuilder     Allowed node icons_in_menu | old_gtk
D UIBuilder     Removed 'property' by attribute: old_gtk
D UIBuilder     Allowed node icons_in_menu | old_gtk
D UIBuilder     Allowed node icons_in_menu | old_gtk
D UIBuilder     Removed 'child' by attribute: traditional_header
D UIBuilder     Removed 'child' by attribute: is_windows
D UIBuilder     Allowed node icons_in_menu | old_gtk
D UIBuilder     Removed 'property' by attribute: is_windows
D UIBuilder     Removed node traditional_header
D UIBuilder     Allowed node traditional_header | icons_in_menu
I StatusIcon    Using backend StatusIconGTK3 (primary)
D Daemon        Reading syncthing config /home/cumber/.config/syncthing/config.xml
V Notifications Error notifications enabled
V Notifications File update notifications enabled
I App           
I App           Syncthing-GTK started and running in notification area
V Daemon        Set refresh interval to 5
W StatusIcon    StatusIcon: Using backend StatusIconAppIndicator (fallback)
E Daemon        g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
Traceback (most recent call last):
  File "/nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/lib/python2.7/site-packages/syncthing_gtk/daemon.py", line 1051, in _connected
    self._connection = self.connect_to_service_finish(results)
Error: g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
D App           Creating connect_dialog
V App           Settinig connect_dialog label Connecting to S
E Daemon        g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
Traceback (most recent call last):
  File "/nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/lib/python2.7/site-packages/syncthing_gtk/daemon.py", line 1051, in _connected
    self._connection = self.connect_to_service_finish(results)
Error: g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
E Daemon        g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
Traceback (most recent call last):
  File "/nix/store/76iwxcvwn9vz192nlkla411c4cyavajn-syncthing-gtk-0.9.3.1/lib/python2.7/site-packages/syncthing_gtk/daemon.py", line 1051, in _connected
    self._connection = self.connect_to_service_finish(results)
Error: g-io-error-quark: Could not connect to 127.0.0.1: Connection refused (39)
...

And stdout is compeltely empty.

Syncthing-gtk version 0.9.3.1, installed from nixos unstable channel.

kozec commented 6 years ago

I don't know much about Nix. Is there any chance that only one of those two ways of start gives access to syncthing daemon binary?

Can you check if there is one running while those unsucesful attempts at connecting to it are happening?

cumber commented 6 years ago

No, there's definitely no daemon running when it's unsuccessfully attempting to connect. And in fact if I start the daemon binary from the command line syncthing-gtk does pick it up and connect.

There shouldn't be any difference between those launch methods having access. They're run with the same PATH setting (since it didn't show up in the environment diff). And it looks like the way it finds the path to the daemon binary on nix systems is here: https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/networking/syncthing-gtk/paths.patch. That @syncthing@is expanded to point to the full path in the nix store of the syncthing binary, and indeed I can see that the configuration.py in my syncthing-gtk install does have a hardcoded path to syncthing (which is how I got the path to start the daemon manually from the command line). Since it's using a hardcoded path, it's hard to see how the external environment could affect whether it can see the daemon binary (short of removing the daemon package from the nix store).

Is there some way to get more information about what happens when it tries to launch the daemon?

0.9.4 just hit nixpkgs unstable, so I'll install that and see if it makes a difference.

cumber commented 6 years ago

No, 0.9.4 hasn't changed anything. :/

kozec commented 6 years ago

Frankly, Nix is still kind of black magic for me, but maybe @jtojnar would have idea what could be causing it?

jtojnar commented 6 years ago

@cumber Could you share an excerpt of your system configuration (xmonad set-up and syncthing-gtk startup)?

cumber commented 6 years ago

This is really weird.

With a bit of hacking to add extra debug messages, I can see that in the failing case (started by xmonad) check_daemon_running (from syncthing_gtk/tools.py) returns True first time (which seems to cause the connect_dialog to be set, which causes it not to check if the daemon is running anymore, it just keeps retrying to connect). It does so because the killall command is returning with an exit status of 0, even though when I replaced the -q with -v and captured the output, killall is still printing syncthing: no process found to its stderr. syncthing-gtk never actually tries to start the daemon.

When I run the exact same hacked version manually from inside my desktop, the killall command returns 1, so check_daemon_running returns False.

So it seems like it is nothing at all to do with syncthing-gtk directly...?

kozec commented 6 years ago

When I run the exact same hacked version manually from inside my desktop, the killall command returns 1, so check_daemon_running returns False.

Check if killall is same thing on both machines. Something like alias or script can probably obscure return value... Or there is simply different killall implementation.

cumber commented 6 years ago

Pretty sure it's the same both times. Nix being nix the killall call in syncthing-gtk is patched to be an absolute path, and its the same machine running the same syncthing-gtk install.

jtojnar commented 6 years ago

You could also try to replace killall with pkill, like some other packages do.

kozec commented 6 years ago

If killall always returns zero, I see no reason for expecting pkill to work differently...

jtojnar commented 6 years ago

When killall is printing syncthing: no process found and exiting with 0, that sounds like a bug. pkill might not contain this bug.

cumber commented 6 years ago

Got it! This was my fault, nothing to do with syncthing-gtk.

What was going on is that Xmonad ignores SIGCHLD signals, so that child processes that exit are immediately garbage collected instead of hanging around as zombies. Xmonad provides a spawn function for running commands which resets the signal handlers, but I rolled my own in order to log the output of my startup commands into their own logs, and I didn't know about the signal handlers.

Without child processes becoming zombies, you can't get their exit status after they've terminated. At Python level, os.system("true") and os.system("false") both return -1 instead of the actual exit status. subprocess.call("true") and subprocess.call("false") however both return 0 when SIGCHLD is ignored.

Presumably other methods of using subprocess.Popen behave the same; so the lack of an exit status from the killall command was being interpreted the same as killall finding a syncthing process.

Xmonad's commands for starting terminals and invoking a launcher were using its built in code rather than the hand-rolled function I used to log my startup commands, so anything I did within my desktop session was getting SIGCHLD reset, and so it worked then.

So if there's any bug at all other than my own xmoand config, it's Python's subprocess code providing a "success" exit status when it couldn't get an exit status, rather than giving any indication of failure.

Thanks very much for the help, and sorry for the false bug report!

kozec commented 6 years ago

Well, if nothing else, it was interesting problem to observe :)

kozec / syncthing-gtk

Can't start daemon when launched automatically, but can when launched manually #460