Open rptb1 opened 6 years ago
This is something we've never seen. Someone can hopefully take a look this sprint. It's likely something particular to your configuration.
Thanks. I'm been able to log in and provision, but the command line and app don't agree about whether I'm logged in.
Likely the electron app is looking for your socket in the wrong place.
This might be relevant. My home directory is not in the standard place in /Users, because I'm managing files between an SSD and HDD. I have this link just in case software makes a poor assumption:
rb@silverbird$ ls -l /Users/rb
lrwxr-xr-x 1 root admin 37 21 Dec 2017 /Users/rb -> /Volumes/Silverbird-HD/Local/Users/rb
rb@silverbird$ echo $HOME
/Volumes/Silverbird-HD/Local/Users/rb
I'm a new user and not in urgent need of Keybase, so there's no need to treat my request as a huge priority. However, my (small) company is evaluating it for internal comms.
Almost certainly that explains the issue. I doubt we've tested this.
Did this issue just start with the latest release btw?
I only started trying Keybase last week. I noticed one update. However, this issue was present before the update.
rb@silverbird$ keybase --version
keybase version 2.6.0-20180911163448+cbaeeb8001
Ok, got it. Maybe try this? Quit out of keybase, and then do:
launchctl setenv HOME /Volumes/Silverbird-HD/Local/Users/rb
And then try launching again?
No effect.
rb@silverbird$ rm *.sock
rb@silverbird$ launchctl setenv HOME "$HOME"
rb@silverbird$ launchctl getenv HOME
/Volumes/Silverbird-HD/Local/Users/rb
New log with this change dfaa9d0ee33f806bd4ff701c. But for now:
rb@silverbird$ launchctl unsetenv HOME
Note, HOME is already set correctly by launchctl. See below.
rb@silverbird$ launchctl submit -l dump-env -o /tmp/1 -e /tmp/2 -- env
rb@silverbird$ cat /tmp/1
TMPDIR=/var/folders/18/c1s1h6z90b9579wdl6d8zwg80000gn/T/
SHELL=/bin/bash
HOME=/Volumes/Silverbird-HD/Local/Users/rb
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.WmXKTaVCFZ/Render
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.A3EFVpWnbq/Listeners
LOGNAME=rb
DISPLAY=/private/tmp/com.apple.launchd.T0Y7AqLGoq/org.macosforge.xquartz:0
PATH=/Volumes/Silverbird-HD/Local/Users/rb/bin:/Volumes/Silverbird-HD/Local/Users/rb/opt/ec2-api-tools/bin:/opt/local/bin:/usr/local/bin:/usr/X11R6/bin:/usr/bin:/bin
XPC_SERVICE_NAME=dump-env
USER=rb
XPC_FLAGS=0x0
P4CONFIG=.p4config
ok. Cc @keybase/react-hackers on finding where your home dir is.
The whole home directory thing might be a red herring -- but it's a red herring with a smoking gun! :)
So if you kill those two symlinks the command line app doesn't work? What happens if you type echo $HOME
on the command line? Thanks!
See https://github.com/keybase/keybase-issues/issues/3277#issuecomment-420638436. HOME matches what it is set to in the System Preferences (and in the system directories). Note that I've run with my home directory set this way for about six years without problems.
The command line app can't find the socket. Some things seem to work with --standalone
. For example:
rb@silverbird$ keybase status
Keybase isn't running. To start, you can run:
open /Applications/Keybase.app
▶ ERROR There were multiple errors: dial unix /Volumes/Silverbird-HD/Local/Users/rb/Library/Group Containers/keybase/Library/Caches/Keybase/keybased.sock: connect: invalid argument; dial unix /Volumes/Silverbird-HD/Local/Users/rb/Library/Caches/Keybase/keybased.sock: connect: no such file or directory
rb@silverbird$ keybase --standalone status
Username: rptb1
Logged in: yes
Device:
name: Silverbird
ID: fdb6ba2c004aed933cd9cc5b70771018
status: active
...
It appears to be a confusion between ~/Library/Caches/Keybase
and ~/Library/Group Containers/keybase/Library/Keybase
. I've skimmed the source code looking for the origin of these paths. I'm trying to track down the function of the "Group Containers" directory to see if it comes from the system library somewhere.
It appears to be to do with sandboxing. See containerURL(forSecurityApplicationGroupIdentifier:) . Perhaps some parts of the system are being sandboxed (such as the application) and some parts are not.
So I was able to repro this by using an external disk, so something about hat is causing the problem. Just setting the home directory works fine.
Actually nvm, it did work, but took forever to start for some reason.
For what it's worth, /Volumes/Silverbird-HD is a partition on the second internal disk (on the SATA bus), but it's not the system volume. This probably doesn't make a difference, but sometimes the Mac makes a distinction between "internal" and "external". The system volume (mounted at "/") is an SSD. It might be that OS X is doing some extra sandboxing because code is not on the system disk -- just speculating.
A fresh account on the same machine with HOME in /Users can successfully run Keybase. See log ca7f51e14bb218003d5ccf1c.
Moving that account's home directory to /Volumes/Silverbird-HD/Local/Users produces a different error!
Logged out, then
rb@silverbird$ sudo rsync -avi /Users/keybase-test/ /Volumes/Silverbird-HD/Local/Users/keybase-test/
Then update the account HOME in the Users and Groups control panel. Logged back in, Keybase app failed to get past "Loading...". Command line did this:
silverbird:keybase-test keybase-test$ echo $HOME
/Volumes/Silverbird-HD/Local/Users/keybase-test
silverbird:keybase-test keybase-test$ keybase id rptb1
▶ WARNING | Changing current working directory because path for dialing is too long
▶ WARNING | Changing current working directory because path for dialing is too long
▶ INFO Identifying rptb1
✔ "rptb1" on github failed: API network error: Get https://gist.githubusercontent.com/rptb1/e67c8b07c74b34cd11330f733f347a78/raw/f5eb6055f3cef1dc69200301ec1fbed0c4e11653/keybase.md: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "rptb1" on twitter failed: API network error: Get https://mobile.twitter.com/rptb1/status/1039608884128821249: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "richard.brooksby" on facebook failed: API network error: Get https://www.facebook.com/richard.brooksby/posts/10102993592483830: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "rptb1" on reddit failed: API network error: Get https://www.reddit.com/r/KeybaseProofs/comments/9f10l8/my_keybase_proof_redditrptb1_keybaserptb1/.json: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
This "path too long" error might be a clue.
The log id for this is 66b68da2ba6673557b9e621c
Moving the account's home directory to a long path on the system volume (under "/") also breaks things in the same way. This suggests that it's not to do with crossing filesystems or the disk hardware.
silverbird:keybase-test keybase-test$ echo $HOME
/This/path/is/too/long/for/your/feeble/software
silverbird:keybase-test keybase-test$ keybase id rptb1
▶ WARNING | Changing current working directory because path for dialing is too long
▶ WARNING | Changing current working directory because path for dialing is too long
▶ INFO Identifying rptb1
✔ "rptb1" on reddit failed: API network error: Get https://www.reddit.com/r/KeybaseProofs/comments/9f10l8/my_keybase_proof_redditrptb1_keybaserptb1/.json: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "rptb1" on twitter failed: API network error: Get https://mobile.twitter.com/rptb1/status/1039608884128821249: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "rptb1" on github failed: API network error: Get https://gist.githubusercontent.com/rptb1/e67c8b07c74b34cd11330f733f347a78/raw/f5eb6055f3cef1dc69200301ec1fbed0c4e11653/keybase.md: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
✔ "richard.brooksby" on facebook failed: API network error: Get https://www.facebook.com/richard.brooksby/posts/10102993592483830: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (code=170)
The log id for this is d3b7aec44f735fb9377e5f1c.
Please excuse my sense of humour. Something is too feeble to cope with a long path, but it's probably not the Keybase code.
Possible duplicate of issue https://github.com/keybase/client/issues/9039#issuecomment-340296285
So yes, something is too feeble. It's net.Dial https://golang.org/pkg/net/#Dial
OK, this has come back to me from about 25 years ago. I believe this is an underlying Unix restriction to do with packing all the various socket address families into a single fixed-length structure. In particular, the AF_UNIX
address family structure has a fixed length path for the sun_path field. See man 7 unix. Go's runtime library is just passing on that restriction. I wrote software loooong ago that tried to create Unix sockets in home directories and had exactly the same problem.
You can't do it reliably.
The usual workaround is to create sockets somewhere with a guaranteed short path, such as /tmp/PID/foo.sock
though there may be better solutions out there.
The reason we have inconsistent results in this thread (and that making symlinks kinda sorta works) is that my home directory + "Library/Keybase" is just under the limit.
I think you will come across this bug on every Posix system sooner or later.
We have code that changes directory to avoid this, I wonder if it is not working in this case for some reason. I'll take a look.
I'm not sure about this (this is old memory), but I don't think you can reliably use a relative path, so chdir might not help.
So this comment (https://github.com/keybase/keybase-issues/issues/3277#issuecomment-421082632) is a fundamentally different error than what you saw before. The command line program is talking to the service, but those errors indicate the service could not connect to any external service. This suggests to me it is actually working, the warnings should not be fatal.
It's also possible that Electron is not doing this chdir thing when it tries to connect to the socket, I think only Go is doing it currently.
This makes the menu bar widget recognise me as logged in:
rb@silverbird$ ln -s $HOME /tmp/foo
rb@silverbird$ HOME=/tmp/foo /Applications/Keybase.app/Contents/MacOS/Keybase
It also makes the "main app" get further.
I'm coming around to the idea that the bug we need to fix here is that Electron cannot connect to the socket when it is in one of these long paths. For background, the reason we put the socket there is to support a Finder extension for KBFS, so moving it might not be in the cards, although I will need to double check that.
You might not need to move it, but instead access it via a symlink in /tmp. Perhaps the "dial" (or equivalent) function could spot the long path and create a symlink in /tmp in that case. Or always use one (better regression testing that way). For what it's worth, I think sometimes using chdir in untested corner cases is somewhat unreliable coding.
Ok we have a ticket internally for this now, will be fixed in the next release (a couple weeks). Thanks for all the great information!
It's been quite fun.
Note, relaunching the service does not help. So I did some poking around then tried symbolic links from to the Group Containers directory from the other Caches directory, like this:
This allowed id to complete.
However, it did not help with https://github.com/keybase/client/issues/5399#issuecomment-420280044
There seems to be some directory confusion going on. Has someone assumed something about paths to home directories?
The log id after the symlink hack above is acb314d287d41384527ce11c
I'm running OS X 10.11.6.