Open greensea opened 5 days ago
JuiceFS connects to the etcd with the address resolved by the local system DNS. Maybe you can debug on this function to see what really happens.
My local system DNS resolve ss.ts.bbxy.net to 100.80.x.x. There is also no entries in /etc/hosts
I am not familiar to etcd, is it possible some etcd API returns the node's IP addresses and juicefs just pick one of the address to connect?
The wired thing is, if it is a DNS issue, juicefs should not be able to format
a system.
I will try debug the function later
I try format and mount the fs by IP address, not domain name, juicefs is still try to connect to a LAN address.
{"level":"warn","ts":"2024-06-25T17:17:14.245594+0800","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001712380/100.80.11.44:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 192.168.0.44:2379: i/o timeout\""}
I format and mount the fs by specifying 100.80.11.44
, but juicefs still got 192.168.0.44
, this is wired, only the etcd node knows 192.168.0.44
exists. Maybe etcd transmit this 192.168.x.x
IP address in someway and juicefs just pick this IP address?
What happened:
I have a etcd cluster with 3 nodes. The nodes are in the same VPN (100.80.x.x). And the nodes also have an LAN address (192.168.0.x). The nodes are located in different LAN, they can't communicate to each other directly, they have to communicate to each other via VPN(100.80.x.x). Also, I built the cluster within the VPN.
I created an juicefs storage:
Note: ss.ts.bbxy.net is resolved to 100.80.x.x
Then mount it:
Now copy a large file into
mnt
, the command stucked. juicefs printed some error logs:The logs shows that juicefs is trying to connect to 192.168.0.44 (LAN address of ss.ts.bbxy.net) and 192.168.0.33 (an other etcd node), which is a LAN address, and I can't connect to this LAN address because it's an other LAN. I think this is the cause of the copy file stuck.
The wired things is, I configure the etcd cluster and the juicefs within the VPN (100.80.x.x), it should not known there is any LAN address(192.168.0.x) and should not try to connect to such address.
What you expected to happen:
cp not stuck. And juicefs not trying to connect to a LAN address (192.168.0.x)
How to reproduce it (as minimally and precisely as possible):
Already describe above.
Anything else we need to know?
No
Environment:
JuiceFS version (use
juicefs --version
) or Hadoop Java SDK version: juicefs version 1.2.0+2024-06-18.873c47bCloud provider or hardware configuration running JuiceFS: self maintained
OS (e.g
cat /etc/os-release
): Debian trixie/sidKernel (e.g.
uname -a
):Linux 6.8.12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.8.12-1 (2024-05-31) x86_64 GNU/LinuxObject storage (cloud provider and region, or self maintained): self maintained
Metadata engine info (version, cloud provider managed or self maintained): etcd Version: 3.4.33
Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage): tailscaled VPN
Others: