trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.21k stars 170 forks source link

Would a connect() sa_family=AF_UNIX on mergerfs give a ECONNREFUSED? #1278

Open jamesread opened 11 months ago

jamesread commented 11 months ago

Hey, I'm trying to use Git Annex on top of mergerfs. Git Annex is having issues when it tries to sync (merely just a git pull / git push), because the control sockets it's using are on mergerfs.

I've just strace'd it, using strace -f git annex sync, and it looks like the problem is here;

[pid 2547901] connect(4, {sa_family=AF_UNIX, sun_path=".git/annex/ssh/xconspirisist@mindstorm"}, 110 <unfinished ...>
[pid 2547904] close(5 <unfinished ...>
[pid 2547703] <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8
[pid 2547904] <... close resumed>)      = 0
[pid 2547901] <... connect resumed>)    = -1 ECONNREFUSED (Connection refused)

I'm just checking if that is expected, as I cannot see it in the mergerfs docs - it seems I can create unix sockets and FIFOs on mergerfs, but can you just confirm (or not) for me that it is mergerfs (or fuse) that is responsible for the ECONNREFUSED on the connect() call? Thanks!

trapexit commented 11 months ago

connect isn't a call that exists in FUSE so I'm not sure. I've not really ever messed with uds or fifos with FUSE. an strace of mergerfs would help.

jamesread commented 11 months ago

Hey @trapexit , I did try to strace --attach to all the mergerfs pids, but I cannot understand why strace is listing lots of newfstatat()'s for irrelevant paths... Maybe something is running and indexing in the background, and strace is picking that up.

connect() is a syscall - not one that would probably show up in fuse I'd bet. I have actually managed to test using nc, and things seem to work fine with a unix domain socket on mergerfs;

shell1:

user@host: cd /mnt/myMergerfs/
user@host: nc -lU foo.sock

shell2:

user@host: cd /mnt/myMergerfs/
user@host: date | nc -U foo.sock

So, this might be something else wrong with git-annex.

jchnkl commented 1 month ago

Not a solution, but a workaround:

git config annex.sshcaching false

Disabling ssh caching (and therefore the control socket) suggests that this a FUSE / mergerfs problem.

I've set my regular ssh control socket to a mergerfs mount (via ControlPath in ~/.ssh/config) and got a similar error a plain ssh connection (without git-annex):

debug1: Authentication succeeded (publickey).                                                                                                                                                                      
Authenticated to monolith ([2001:9e8:ab8e:2000:1e1b:dff:fee0:94]:22).                                                                                                                                              
debug1: setting up multiplex master socket                                                                                                                                                                         
debug3: muxserver_listen: temporary control path /data/jchnkl/ssh-mux/jchnkl@monolith:22.jdUHU6eMwZJgoK7S                                                                                                          
debug2: fd 4 setting O_NONBLOCK                                                                                                                                                                                    
debug3: fd 4 is O_NONBLOCK                                                                                                                                                                                         
debug3: fd 4 is O_NONBLOCK                                                                                                                                                                                         
debug1: channel 0: new [/data/jchnkl/ssh-mux/jchnkl@monolith:22]                                                                                                                                                   
debug3: muxserver_listen: mux listener channel 0 fd 4                                                                                                                                                              
debug2: fd 3 setting TCP_NODELAY                                                                                                                                                                                   
debug3: ssh_packet_set_tos: set IPV6_TCLASS 0x08                                                                                                                                                                   
debug1: control_persist_detach: backgrounding master process                                                                                                                                                       
debug2: control_persist_detach: background process is 774087                                                                                                                                                       
Control socket connect(/data/jchnkl/ssh-mux/jchnkl@monolith:22): Connection refused                                                                                                                                
Failed to connect to new control master
trapexit commented 1 month ago

It would help if someone provided explicit details on how things were setup or provide the strace of mergerfs. That said fuse doesn't have functions for connect so if it is related to that then there is likely nothing I can do.

jchnkl commented 1 month ago

Hi, in my case the setup is fairly easy:

/data -fstype=fuse,cache.files=partial,dropcacheonclose=true,category.create=mfs,noforget,inodecalc=path-hash,allow_other :mergerfs\#/.data/local:/.data/nfs=RO

The git repo goes into /.data/local where it works flawless, the muxing error only shows up when running through the /data directory.

Thanks!