Open LoveEachDay opened 1 year ago
/assign @jiaoew1991
/assign @smellthemoon can you help on it?
@LoveEachDay will "0.0.0.0" do for docker swarm
case?
@congqixia 0.0.0.0
may not work.
Milvus component will store its serving ip address in the etcd, so that other components can access through. If it is listening on all network interfaces, which ip address should we stored in the session?
We have two problems here:
Bind to 0.0.0.0
may solve case (1), but does not solve case (2).
working on it
@LoveEachDay : I believe in other distributed systems there is an --advertise-address
option where you can set what address to advertise to clients.
--advertise-address string | The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the --bind-address will be used. If --bind-address is unspecified, the host's default interface will be used.
In the case of docker and kubernetes IP addresses are dynamically allocated.
It might make sense to be able to specify a CIDR network range to filter the IP or a network interface name as well ?!
@LoveEachDay : I believe in other distributed systems there is an
--advertise-address
option where you can set what address to advertise to clients.
- https://docs.docker.com/engine/reference/commandline/swarm_init/#--advertise-addr
- https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
--advertise-address string | The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the --bind-address will be used. If --bind-address is unspecified, the host's default interface will be used.
@ieugen We use the way to set up the advertise addr. In Kubernetes, only one pod ip address will be assigned, you can leave the address
field to be empty, which will used the default ip address.
Pass a cidr may not feasible. It's the user ability to make sure the ip address is routable and can be statically assigned.
@LoveEachDay
I think some filtering could be implemented so that the code would work like this: See original code bellow my pseudo code attempt (I don't know go).
func GetIP (ip string, options) {
if len(ip) == 0 {
return GetLocalIP(options)
}
return ip
}
// GetLocalIP return the local ip address
func GetLocalIP(options ??? ) string {
addrs, err := net.InterfaceAddrs()
if err == nil {
for _, addr := range addrs {
ipaddr, ok := addr.(*net.IPNet)
if ok && matchIpOptions(ipaddr,options) {
return ipaddr.IP.String()
}
}
}
return "127.0.0.1"
}
func matchIpOptions(ipaddr, options) {
// check ipaddr.IP.IsGlobalUnicast() && ipaddr.IP.To4() != nil
and
// check if ipaddr matches the filtering options in "options"
// checks could be any checks supported by https://pkg.go.dev/net#Addr .
// parse CIDR received in options with https://pkg.go.dev/net#ParseCIDR in and check if IP is in the network
// check if
// This might be usefull to get network name from IP https://pkg.go.dev/net#IPAddr.Network
// This builds a network object (from a CIDR ?) https://pkg.go.dev/net#IPAddr.Network
// This checks if IP is inside network https://pkg.go.dev/net#IPNet.Contains
}
Original code here https://github.com/milvus-io/milvus/blob/eca79d149cc2d5522f43a1f2c5f2a5bfcc71c8ca/pkg/util/funcutil/func.go#L62C13-L62C13
func GetIP(ip string) string {
if len(ip) == 0 {
return GetLocalIP()
}
return ip
}
// GetLocalIP return the local ip address
func GetLocalIP() string {
addrs, err := net.InterfaceAddrs()
if err == nil {
for _, addr := range addrs {
ipaddr, ok := addr.(*net.IPNet)
if ok && ipaddr.IP.IsGlobalUnicast() && ipaddr.IP.To4() != nil {
return ipaddr.IP.String()
}
}
}
return "127.0.0.1"
}
@LoveEachDay : I have added more information connected to this issue https://github.com/milvus-io/milvus/issues/25032#issuecomment-1780720924 .
In docker swarm, addding healthchecks to milvus makes the service run eventually. It restarts until it binds to the proper interface / IP address. I guess this is due to the how the IP is selected.
Perhaps different components each go through the IP selection process and reach different results => failures ?!
this should be supported on milvus 2.3.2
thank you @xiaofan-luan . I am curios which parts. For docker swarm, since we can't specify container IP, we would need a CIDR based filter (we specify the CIDR the app should filter interfaces, when it finds an IP in that CIDR range, it should bind to it).
There is an old issue with swarm and IP addresses for containers: https://github.com/moby/moby/issues/24170 .
To have an idea, this is what ip addr
shows inside a container.
Container IP addresses are randomly assigned - one is ip inide docker_gwbridge network and the other is container IP for it's associated network.
root@dev_db-1-dev1:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
678: eth0@if679: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue state UP group default
link/ether REDACTED brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.186/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
680: eth1@if681: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether REDACTED brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1
valid_lft forever preferred_lft forever
On a server with IPV6 address, the service will restart continuously. So my workaround for the IP binding bug is not working. I have to setup a special IPV4 only server for milvus. If it where for me I would ban milvus, but I only manage the apps so I have to put up with it.
how you you deploy any database if you can specify which port to use? For example, pg will use 5432
@congqixia
0.0.0.0
may not work. Milvus component will store its serving ip address in the etcd, so that other components can access through. If it is listening on all network interfaces, which ip address should we stored in the session?We have two problems here:
- which ip address to listening,
- ip address should to be routable
Bind to
0.0.0.0
may solve case (1), but does not solve case (2).
the internal design of milvus works fine. It stored both endpoint and port.
I think the problem is how SDK connect to milvus
g l
On a server with IPV6 address, the service will restart continuously. So my workaround for the IP binding bug is not working. I have to setup a special IPV4 only server for milvus. If it where for me I would ban milvus, but I only manage the apps so I have to put up with it.
Non of us are docker swarm expert and milvus is ideally designed for K8s. Feel free to directly issue a pr on how to deploy milvus on docker swarm and this could definitely help more users.
hi @xiaofan-luan ,
Thanks for replying. It's not docker swarm specific. The issue is with how milvus selects the interface to bind to / connect to other services. Right now it's completely non-deterministic and non-configurable. So on any system with more than one network interface it is likely to fail.
It's more prevalent in Docker Swarm because swarm has 1 network interface per network (2 + localhost minimum) and I can't do anything about that unfortunatelly (because of Swarm limitations, not milvus). This is unfortunate for me because I have to deal with it.
Yesterday I was very helpless when I had to "fix" this after a 12h weekend upgrade of our servers. I was tired but everything was working ok - except milvus . It managed to stir a lot of frustration inside.
The logic for selecting the IP is exemplified above. As you can see - it's a for loop and gets the first network interface that it finds. If the OS list is random, it might work, it might not. If the OS list is sorted, and the first valid interface is not the right now, it will always fail.
https://github.com/milvus-io/milvus/issues/26538#issuecomment-1752440264
I considered fixing this myuself - but I don't know go and I am not working with milvus directly. I just deploy and maintain it - and have to deal with the outages, etc.
For some reason it started working after a few hours and a lot of config tweeking. I am afraid to restart it and migrate it - but I do have to do server maintenance, so there is no avoiding that. The only solution IMO would be a fix where users can select the network interface either by name or better IMO by matching the interface IP to a CIDR subnet supplied by user.
User supplies 10.11.0.0/24
CIDR (or IPV6 one ?! ) and milvus can check if the network interface has an IP in that range.
If no such interface is found, an error is thrown (since user requested a specific network interface) .
The code IMO should look something like this:
network := "192.168.5.0/24"
clientips := []string{
"192.168.5.1",
"192.168.6.0",
}
_, subnet, _ := net.ParseCIDR(network)
for _, clientip := range clientips {
ip := net.ParseIP(clientip)
if subnet.Contains(ip) {
fmt.Println("IP in subnet", clientip)
}
}
hi @xiaofan-luan ,
Thanks for replying. It's not docker swarm specific. The issue is with how milvus selects the interface to bind to / connect to other services. Right now it's completely non-deterministic and non-configurable. So on any system with more than one network interface it is likely to fail.
It's more prevalent in Docker Swarm because swarm has 1 network interface per network (2 + localhost minimum) and I can't do anything about that unfortunatelly (because of Swarm limitations, not milvus). This is unfortunate for me because I have to deal with it.
Yesterday I was very helpless when I had to "fix" this after a 12h weekend upgrade of our servers. I was tired but everything was working ok - except milvus . It managed to stir a lot of frustration inside.
The logic for selecting the IP is exemplified above. As you can see - it's a for loop and gets the first network interface that it finds. If the OS list is random, it might work, it might not. If the OS list is sorted, and the first valid interface is not the right now, it will always fail.
I considered fixing this myuself - but I don't know go and I am not working with milvus directly. I just deploy and maintain it - and have to deal with the outages, etc.
For some reason it started working after a few hours and a lot of config tweeking. I am afraid to restart it and migrate it - but I do have to do server maintenance, so there is no avoiding that. The only solution IMO would be a fix where users can select the network interface either by name or better IMO by matching the interface IP to a CIDR subnet supplied by user.
User supplies
10.11.0.0/24
CIDR (or IPV6 one ?! ) and milvus can check if the network interface has an IP in that range. If no such interface is found, an error is thrown (since user requested a specific network interface) .The code IMO should look something like this:
network := "192.168.5.0/24" clientips := []string{ "192.168.5.1", "192.168.6.0", } _, subnet, _ := net.ParseCIDR(network) for _, clientip := range clientips { ip := net.ParseIP(clientip) if subnet.Contains(ip) { fmt.Println("IP in subnet", clientip) } }
This could be an option. has a config useCIDR and let the admin manage this. It seems extremely difficult for a distributed database to work with docker swarm. Sorry about the trouble. Maybe using standalone can solve the problem somehow.
@LoveEachDay what do you think about adding this cidr check and also check IsGlobalUnicast
@xiaofan-luan @ieugen Add a cidr check would be a good option to cover this scenario which has multiple ips for milvus components. I would suggest we add a config option in the milvus.yaml like this:
// configs/milvus.yaml
rootCoord:
ip: ""
cidr: ""
port: 53100
And in the pkg/util/paramtable//grpc_prams.go
, we take up the config initialization logic here.
func (p *grpcConfig) init(domain string, base *BaseTable) {
p.Domain = domain
ipItem := ParamItem{
Key: p.Domain + ".ip",
Version: "2.3.3",
DefaultValue: "",
Export: true,
}
ipItem.Init(base.mgr)
// here add cidr initialization
p.IP = funcutil.GetIP(ipItem.GetValue()) // override GetIP util helper to take cidr in account.
...
}
Any plans to get this feature in a release ?
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
By default, milvus components will use
funcutil.GetLocalIP()
to initialize grpc listening ip address, which may not work when pod/container has multiple network interfaces such asdocker swarm
, see also: https://github.com/milvus-io/milvus/issues/25032Describe the solution you'd like.
In the
milvus.yaml
we can add anaddress
config for each components which allow user to specify a routable address like this:Describe an alternate solution.
No response
Anything else? (Additional Context)
No response