Closed cumber closed 6 years ago
I don't know much about Nix. Is there any chance that only one of those two ways of start gives access to syncthing
daemon binary?
Can you check if there is one running while those unsucesful attempts at connecting to it are happening?
No, there's definitely no daemon running when it's unsuccessfully attempting to connect. And in fact if I start the daemon binary from the command line syncthing-gtk does pick it up and connect.
There shouldn't be any difference between those launch methods having access. They're run with the same PATH setting (since it didn't show up in the environment diff). And it looks like the way it finds the path to the daemon binary on nix systems is here: https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/networking/syncthing-gtk/paths.patch. That @syncthing@
is expanded to point to the full path in the nix store of the syncthing binary, and indeed I can see that the configuration.py in my syncthing-gtk install does have a hardcoded path to syncthing (which is how I got the path to start the daemon manually from the command line). Since it's using a hardcoded path, it's hard to see how the external environment could affect whether it can see the daemon binary (short of removing the daemon package from the nix store).
Is there some way to get more information about what happens when it tries to launch the daemon?
0.9.4 just hit nixpkgs unstable, so I'll install that and see if it makes a difference.
No, 0.9.4 hasn't changed anything. :/
Frankly, Nix is still kind of black magic for me, but maybe @jtojnar would have idea what could be causing it?
@cumber Could you share an excerpt of your system configuration (xmonad set-up and syncthing-gtk startup)?
This is really weird.
With a bit of hacking to add extra debug messages, I can see that in the failing case (started by xmonad) check_daemon_running
(from syncthing_gtk/tools.py
) returns True
first time (which seems to cause the connect_dialog
to be set, which causes it not to check if the daemon is running anymore, it just keeps retrying to connect). It does so because the killall
command is returning with an exit status of 0, even though when I replaced the -q
with -v
and captured the output, killall
is still printing syncthing: no process found
to its stderr. syncthing-gtk never actually tries to start the daemon.
When I run the exact same hacked version manually from inside my desktop, the killall
command returns 1, so check_daemon_running
returns False
.
So it seems like it is nothing at all to do with syncthing-gtk directly...?
When I run the exact same hacked version manually from inside my desktop, the killall command returns 1, so check_daemon_running returns False.
Check if killall is same thing on both machines. Something like alias or script can probably obscure return value... Or there is simply different killall implementation.
Pretty sure it's the same both times. Nix being nix the killall call in syncthing-gtk is patched to be an absolute path, and its the same machine running the same syncthing-gtk install.
You could also try to replace killall
with pkill
, like some other packages do.
If killall
always returns zero, I see no reason for expecting pkill
to work differently...
When killall
is printing syncthing: no process found
and exiting with 0, that sounds like a bug. pkill
might not contain this bug.
Got it! This was my fault, nothing to do with syncthing-gtk.
What was going on is that Xmonad ignores SIGCHLD signals, so that child processes that exit are immediately garbage collected instead of hanging around as zombies. Xmonad provides a spawn
function for running commands which resets the signal handlers, but I rolled my own in order to log the output of my startup commands into their own logs, and I didn't know about the signal handlers.
Without child processes becoming zombies, you can't get their exit status after they've terminated. At Python level, os.system("true")
and os.system("false")
both return -1
instead of the actual exit status. subprocess.call("true")
and subprocess.call("false")
however both return 0 when SIGCHLD is ignored.
Presumably other methods of using subprocess.Popen
behave the same; so the lack of an exit status from the killall
command was being interpreted the same as killall
finding a syncthing
process.
Xmonad's commands for starting terminals and invoking a launcher were using its built in code rather than the hand-rolled function I used to log my startup commands, so anything I did within my desktop session was getting SIGCHLD reset, and so it worked then.
So if there's any bug at all other than my own xmoand config, it's Python's subprocess
code providing a "success" exit status when it couldn't get an exit status, rather than giving any indication of failure.
Thanks very much for the help, and sorry for the false bug report!
Well, if nothing else, it was interesting problem to observe :)
I'm running Xmonad on Nixos, so my system's a little weird. This is probably something with my environment, but I can't figure out what.
I have syncthing-gtk started from my xmonad startup hook. It starts up, and the UI remains responsive, but it just fills the log up with repeated messages like:
Adding a new repetition every second or so. Nothing changes after waiting even several minutes.
If I quit the syncthing-gtk that Xmonad launched and start it manually, everything works fine; it connects to the daemon within a few attempts.
If I quit my desktop session, the daemon process doesn't get killed; on logging back in the syncthing-gtk started by Xmonad does connect to the existing daemon process and work. However if I kill the syncthing daemon the Xmonad-started syncthing-gtk fails to relaunch the daemon, and emits those "Connection refused" error messages into the log every second or so. (Killing the daemon out from under a syncthing-gtk I've started manually results in it starting another daemon process and all is well)
I thought it might just be a timing problem, where some session initialization hasn't happened yet by the time Xmonad is automatically launching it, so I put in delays, but even delays of 30s doesn't make a difference (and I'm able to launch it manually within 30 seconds of startup and it consistently works).
I thought there might be a difference in environment variables the two launch methods inherit, but saving and diffing the
/proc/<pid>/environ
between the working and non-working cases only revealed that when I launched it manually there were 3 additional variables in the working case:Otherwise the two environment files were identical.
Running with
--debug
and--dump
, I get this on stderr when it's not working:And stdout is compeltely empty.
Syncthing-gtk version 0.9.3.1, installed from nixos unstable channel.