milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.15k stars 2.71k forks source link

[Bug]: milvus-querycoord and milvus-proxy milvus-datanode are not able to connect to rootcoord it is going to 127.0.0.1:53100 instead of [milvus-rootcoord]:53100 #34260

Open milvus-user opened 1 week ago

milvus-user commented 1 week ago

Is there an existing issue for this?

Environment

- Milvus version: v2.4.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 4 cpu and 8Gi memory
- GPU: 
- Others:

Current Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 instead of milvus-rootcoord:53100

Expected Behavior

**milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 It have to go on below adress

because if milvus-querycoord/milvus-proxy/milvus-datanode will try to connect the root-coord inside their pod by localhost that is giving the error

Steps To Reproduce

We can deploy the Milvus v2.4.4 and will get the error by default helm by passing values required in values.yaml/

Milvus Log

[2024/06/28 05:27:26.834 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.837 +00:00] [DEBUG] [querycoord/service.go:218] [network] [port=19531] [2024/06/28 05:27:26.938 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.941 +00:00] [DEBUG] [sessionutil/session_util.go:257] ["Session try to connect to etcd"] [2024/06/28 05:27:26.942 +00:00] [DEBUG] [sessionutil/session_util.go:272] ["Session connect to etcd success"] [2024/06/28 05:27:26.943 +00:00] [DEBUG] [querycoord/service.go:168] ["QueryCoord try to wait for RootCoord ready"] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.945 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:26.945 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:27.038 +00:00] [DEBUG] [querycoordv2/server.go:584] ["QueryCoord current state"] [StateCode=Abnormal] [2024/06/28 05:27:27.148 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.150 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:27.551 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.553 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:28.354 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:28.356 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:29.957 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:29.959 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] :

##################

[2024/06/28 05:27:24.391 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.391 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.392 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.392 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.595 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.597 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:24.998 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/06/28 05:27:25.000 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]

Anything else?

It is working with version v2.3

############## Config.tpl is generating the expected result and there is no issue in helm chart.

kafka:
  brokerList: milvus-kafka:9092

rootCoord:
  address: milvus-rootcoord
  port: 53100
  enableActiveStandby: false  # Enable rootcoord active-standby

proxy:
  port: 19530
  internalPort: 19529

queryCoord:
  address: milvus-querycoord
  port: 19531

  enableActiveStandby: false  # Enable querycoord active-standby

queryNode:
  port: 21123
  enableDisk: true # Enable querynode load disk index, and search on disk index

indexCoord:
  address: milvus-indexcoord
  port: 31000
  enableActiveStandby: false  # Enable indexcoord active-standby

indexNode:
  port: 21121
  enableDisk: true # Enable index node build disk vector index

dataCoord:
  address: milvus-datacoord
  port: 13333
  enableActiveStandby: false  # Enable datacoord active-standby

dataNode:
  port: 21124

log:
  level: info
  file:
    rootPath: ""
    maxSize: 300
    maxAge: 10
    maxBackups: 20
yanliang567 commented 1 week ago

/assign @LoveEachDay /unassign

xiaofan-luan commented 6 days ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Environment

- Milvus version: v2.4.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 4 cpu and 8Gi memory
- GPU: 
- Others:

Current Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 instead of [milvus-rootcoord]:53100

Expected Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 It have to go on below adress [milvus-rootcoord]:53100

because if milvus-querycoord/milvus-proxy/milvus-datanode will try to connect the root-coord inside their pod by localhost that is giving the error

Steps To Reproduce

We can deploy the Milvus v2.4.4 and will get the error by default helm by passing values required in values.yaml/

Milvus Log

[2024/06/28 05:27:26.834 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.837 +00:00] [DEBUG] [querycoord/service.go:218] [network] [port=19531] [2024/06/28 05:27:26.938 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.941 +00:00] [DEBUG] [sessionutil/session_util.go:257] ["Session try to connect to etcd"] [2024/06/28 05:27:26.942 +00:00] [DEBUG] [sessionutil/session_util.go:272] ["Session connect to etcd success"] [2024/06/28 05:27:26.943 +00:00] [DEBUG] [querycoord/service.go:168] ["QueryCoord try to wait for RootCoord ready"] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.945 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.945 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.038 +00:00] [DEBUG] [querycoordv2/server.go:584] ["QueryCoord current state"] [StateCode=Abnormal] [2024/06/28 05:27:27.148 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.150 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.551 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.553 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:28.354 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:28.356 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:29.957 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:29.959 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] :

##################

[2024/06/28 05:27:24.391 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.391 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.392 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.392 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.595 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.597 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.998 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:25.000 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]

Anything else?

It is working with version v2.3

############## Config.tpl is generating the expected result and there is no issue in helm chart.

kafka:
  brokerList: milvus-kafka:9092

rootCoord:
  address: milvus-rootcoord
  port: 53100
  enableActiveStandby: false  # Enable rootcoord active-standby

proxy:
  port: 19530
  internalPort: 19529

queryCoord:
  address: milvus-querycoord
  port: 19531

  enableActiveStandby: false  # Enable querycoord active-standby

queryNode:
  port: 21123
  enableDisk: true # Enable querynode load disk index, and search on disk index

indexCoord:
  address: milvus-indexcoord
  port: 31000
  enableActiveStandby: false  # Enable indexcoord active-standby

indexNode:
  port: 21121
  enableDisk: true # Enable index node build disk vector index

dataCoord:
  address: milvus-datacoord
  port: 13333
  enableActiveStandby: false  # Enable datacoord active-standby

dataNode:
  port: 21124

log:
  level: info
  file:
    rootPath: ""
    maxSize: 300
    maxAge: 10
    maxBackups: 20

from the log, root coord register 127.0.0.1 as it's address address it get by

hostName, hostNameErr := os.Hostname()
if hostNameErr != nil {
    log.Error("get host name fail", zap.Error(hostNameErr))
}

session := &Session{
    ctx:      ctx,
    metaRoot: metaRoot,
    Version:  common.Version,

    SessionRaw: SessionRaw{
        HostName: hostName,
    },

    // options
    sessionTTL:        paramtable.Get().CommonCfg.SessionTTL.GetAsInt64(),
    sessionRetryTimes: paramtable.Get().CommonCfg.SessionRetryTimes.GetAsInt64(),
    reuseNodeID:       true,
    isStopped:         *atomic.NewBool(false),
}

so, most likely you are deploy in our official docker. You can use ifconfig check the network setting

milvus-user commented 6 days ago

yes we had deployed the official docker

milvus-user commented 6 days ago

ifconfig result.... ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 inet6 xxxb:cxxx:xxx:xxxx:0:x:x:x prefixlen 64 scopeid 0x0 inet6 fxxx::xxxx:xxx:xxxx:xxxx prefixlen 64 scopeid 0x20 ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet) RX packets 31133598 bytes 76781057453 (71.5 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 16900577 bytes 59431116180 (55.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 89192 bytes 957410077 (913.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 89192 bytes 957410077 (913.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:68:16:54 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

xiaofan-luan commented 6 days ago

The reason os.Hostname is seeing 127.0.0.1 rather than 192.168.122.1 is that 127.0.0.1 is the address associated with the loopback interface (lo), which is the default address for the local host. The 192.168.122.1 address is associated with a virtual bridge interface (virbr0), which is typically used for virtual networking, such as with virtual machines or containers.

Here’s a breakdown of what’s happening:

Loopback Interface (lo): inet 127.0.0.1 is the loopback address. This address is used by the system to refer to itself. It’s always present and is the default address for hostname resolution on the local machine. Virtual Bridge Interface (virbr0): inet 192.168.122.1 is the IP address assigned to the virtual bridge. This address is typically used for networking between virtualized environments and the host system. Ethernet Interface (eth0): No IPv4 address is provided, only IPv6 addresses are shown. When you query the hostname, by default, it resolves to the loopback address (127.0.0.1). This is because the hostname resolution on most systems is configured to resolve the hostname to the loopback address unless explicitly configured otherwise.

If you want the hostname to resolve to 192.168.122.1, you would need to modify your system’s network configuration. Here’s how you can adjust this on a Linux system:

Edit /etc/hosts: Add a line to associate your hostname with 192.168.122.1.

192.168.122.1 your-hostname Ensure Network Configuration: Make sure that virbr0 is correctly configured to be up and running with the desired IP address. Restart Networking: Restart the network service to apply the changes.

sudo systemctl restart networking Keep in mind that changing the hostname resolution might affect your system's networking behavior, especially if virbr0 is not always active or if the IP address might change.

For programmatic access, you might need to explicitly query the IP address of the virbr0 interface instead of relying on os.Hostname. This can be done using various libraries or system calls to retrieve the IP address of a specific interface.

xiaofan-luan commented 6 days ago

this is what i got from GPT and hopefully that could help

milvus-user commented 5 days ago

the ifconfig result.. that i shared it was for rootcoord only so for itself it was using the 127.0.0.1 address

milvus-user commented 5 days ago

apart from this we had tried with milvus v2.3.13 and that is running fine with same config no change apart from image

milvus-user commented 5 days ago

reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/07/02 06:45:12.553 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/07/02 06:45:12.553 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=23] [2024/07/02 06:45:12.554 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/07/02 06:45:12.554 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=8] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/07/02 06:45:14.814 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"] [2024/07/02 06:45:19.814 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"] [2024/07/02 06:45:22.555 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/07/02 06:45:22.556 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/07/02 06:45:22.556 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=23] [2024/07/02 06:45:22.557 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = \"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused\""] [2024/07/02 06:45:24.813 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"]

xiaofan-luan commented 5 days ago

apart from this we had tried with milvus v2.3.13 and that is running fine with same config no change apart from image

how did you install milvus?

this logic has not been changed since 2.3.4

So this is definitely not a bug but more of a env issue