milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.34k stars 2.73k forks source link

[Bug]: milvus-standalone cannot conntect etcd on OpenShift #33967

Closed uniquejava closed 4 weeks ago

uniquejava commented 4 weeks ago

Is there an existing issue for this?

Environment

- Milvus version:2.4.4
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka): none   
- SDK version(e.g. pymilvus v2.0.0rc2): OpenShift/kubernetes
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

milvus-standalone cannot conntect to etcd and failed to start

Expected Behavior

milvus standalone should be able to connect to running etcd

Steps To Reproduce

Note that oc is equivalent to kubectl

helm install my-milvus milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false

NAME: my-milvus
LAST DEPLOYED: Fri Jun 14 16:29:00 2024
NAMESPACE: smbc-mobile-devops
STATUS: deployed
REVISION: 1
TEST SUITE: None

$ oc get po -w
NAME                                    READY   STATUS             RESTARTS       AGE
my-attu-679b684645-tgqz4                1/1     Running            0              17h
my-milvus-etcd-0                        1/1     Running            0              10m
my-milvus-minio-b94d9974-gq924          1/1     Running            0              10m
my-milvus-standalone-7d97c86f9b-skffx   0/1     CrashLoopBackOff   5 (114s ago)   6m6s
my-qdrant-0                             1/1     Running            0              7h43m

Milvus Log

[2024/06/19 02:37:06.402 +00:00] [INFO] [distance/calc_distance_amd64.go:14] ["Hook avx for go simd distance computation"]
2024/06/19 02:37:06 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined

    __  _________ _   ____  ______    
   /  |/  /  _/ /| | / / / / / __/    
  / /|_/ // // /_| |/ / /_/ /\ \    
 /_/  /_/___/____/___/\____/___/     

Welcome to use Milvus!
Version:   v2.4.4
Built:     Fri May 31 09:30:48 UTC 2024
GitCommit: 8e7f36d9
GoVersion: go version go1.20.7 linux/amd64

TotalMem: 16794943488
UsedMem: 24322048

open pid file: /run/milvus/standalone.pid
lock pid file: /run/milvus/standalone.pid
[2024/06/19 02:37:06.405 +00:00] [INFO] [roles/roles.go:306] ["starting running Milvus components"]
[2024/06/19 02:37:06.406 +00:00] [INFO] [roles/roles.go:169] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so]
[2024/06/19 02:37:06.432 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
[2024/06/19 02:37:06.433 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"EnableAuth\":false,\"UserName\":\"\",\"PassWord\":\"\",\"UseSSL\":false,\"Endpoints\":[\"my-milvus-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"]
[2024/06/19 02:37:06.434 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.435 +00:00] [INFO] [paramtable/base_table.go:209] ["init with etcd failed"] [error="context deadline exceeded"]
[2024/06/19 02:37:11.438 +00:00] [INFO] [paramtable/hook_config.go:21] ["hook config"] [hook={}]
[2024/06/19 02:37:11.438 +00:00] [INFO] [roles/roles.go:255] [setupPrometheusHTTPServer]
[2024/06/19 02:37:11.438 +00:00] [INFO] [rootcoord/root_coord.go:154] ["update rootcoord state"] [state=Abnormal]
[2024/06/19 02:37:11.438 +00:00] [DEBUG] [rootcoord/service.go:184] ["init params done.."]
[2024/06/19 02:37:11.438 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.439 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
[2024/06/19 02:37:11.439 +00:00] [INFO] [http/server.go:112] ["management listen"] [addr=:9091]
[2024/06/19 02:37:11.439 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.440 +00:00] [INFO] [components/index_coord.go:38] ["IndexCoord running ..."]
[2024/06/19 02:37:11.440 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.440 +00:00] [DEBUG] [querynode/service.go:104] [QueryNode] [port=21123]
[2024/06/19 02:37:11.440 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.442 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.442 +00:00] [DEBUG] [indexnode/indexnode.go:115] ["New IndexNode ..."]
[2024/06/19 02:37:11.442 +00:00] [DEBUG] [indexnode/service.go:87] [IndexNode] ["network address"=172.17.86.165:21121] ["network port: "=21121]
[2024/06/19 02:37:11.442 +00:00] [INFO] [proxy/lb_policy.go:78] ["use look_aside policy on replica selection"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/simple_rate_limiter.go:225] ["RateLimiter register for rateType"] [rateType=DDLIndex] [rateLimit=+inf] [burst=1.7976931348623157e+308]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/simple_rate_limiter.go:225] ["RateLimiter register for rateType"] [rateType=DDLFlush] [rateLimit=+inf] [burst=1.7976931348623157e+308]
[2024/06/19 02:37:11.443 +00:00] [INFO] [runtime/asm_amd64.s:1598] ["Start check query node health loop"]
[2024/06/19 02:37:11.443 +00:00] [INFO] [hookutil/hook.go:46] ["empty so path, skip to load plugin"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:122] ["create a new Proxy instance"] [state=2]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:423] ["init Proxy server"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:454] ["Proxy init service's parameter table done"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:456] ["Proxy init http server's parameter table done"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:463] ["init Proxy's parameter table done"] [internalAddress=172.17.86.165:19529] [externalAddress=172.17.86.165:19530]
[2024/06/19 02:37:11.443 +00:00] [INFO] [accesslog/global.go:145] ["Init access logger success"]
[2024/06/19 02:37:11.443 +00:00] [DEBUG] [proxy/service.go:470] ["init Proxy's tracer done"] ["service name"="Proxy ip: 172.17.86.165, port: 19530"]
[2024/06/19 02:37:11.443 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.544 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:16.439 +00:00] [DEBUG] [rootcoord/service.go:198] ["RootCoord connect to etcd failed"] [error="context deadline exceeded"]
[2024/06/19 02:37:16.439 +00:00] [ERROR] [components/root_coord.go:55] ["RootCoord starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*RootCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/root_coord.go:55\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113"]
panic: context deadline exceeded

goroutine 176 [running]:
panic({0x5025900, 0x7c59e60})
    /usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc000dc7f70 sp=0xc000dc7eb0 pc=0x1c25f1b
github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
    /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:114 +0x105 fp=0xc000dc7fe0 sp=0xc000dc7f70 pc=0x49c1565
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000dc7fe8 sp=0xc000dc7fe0 pc=0x1c5fec1
created by github.com/milvus-io/milvus/cmd/roles.runComponent[...]
    /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:105 +0x158

goroutine 1 [semacquire]:
runtime.gopark(0x5b22e20?, 0xc0016b6038?, 0xe0?, 0xd3?, 0xc0016c8000?)
    /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00150f6f0 sp=0xc00150f6d0 pc=0x1c29496
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:387
runtime.semacquire1(0xc0005675c8, 0xc0?, 0x1, 0x0, 0x1?)
    /usr/local/go/src/runtime/sema.go:160 +0x20f fp=0xc00150f758 sp=0xc00150f6f0 pc=0x1c3b26f
sync.runtime_Semacquire(0x568e720?)
    /usr/local/go/src/runtime/sema.go:62 +0x27 fp=0xc00150f790 sp=0xc00150f758 pc=0x1c5b447
sync.(*WaitGroup).Wait(0xc00129e070?)
    /usr/local/go/src/sync/waitgroup.go:116 +0x4b fp=0xc00150f7b8 sp=0xc00150f790 pc=0x1c828cb
github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run(0xc0012b0960)
    /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:393 +0xa46 fp=0xc00150fc78 sp=0xc00150f7b8 pc=0x49c0286
github.com/milvus-io/milvus/cmd/milvus.(*run).execute(0x0?, {0xc000052180?, 0x3, 0x3}, 0xc001283260)
    /go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:47 +0x2e5 fp=0xc00150fd48 sp=0xc00150fc78 pc=0x49cb8c5
github.com/milvus-io/milvus/cmd/milvus.RunMilvus({0xc000052180?, 0x3, 0x3})
    /go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60 +0x20e fp=0xc00150fdc0 sp=0xc00150fd48 pc=0x49cb54e
main.main()
    /go/src/github.com/milvus-io/milvus/cmd/main.go:95 +0x3e5 fp=0xc00150ff80 sp=0xc00150fdc0 pc=0x49d0285
runtime.main()
    /usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc00150ffe0 sp=0xc00150ff80 pc=0x1c29067
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00150ffe8 sp=0xc00150ffe0 pc=0x1c5fec1

Anything else?

$ oc get sts
NAME             READY   AGE
my-milvus-etcd   1/1     17m
my-qdrant        1/1     8h

$ oc get svc
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
my-attu-svc               ClusterIP   172.21.127.80    <none>        3000/TCP                     17h
my-milvus                 ClusterIP   172.21.213.103   <none>        19530/TCP,9091/TCP           18m
my-milvus-etcd            ClusterIP   172.21.30.159    <none>        2379/TCP,2380/TCP            18m
my-milvus-etcd-headless   ClusterIP   None             <none>        2379/TCP,2380/TCP            18m
my-milvus-minio           ClusterIP   172.21.251.60    <none>        9000/TCP                     18m
my-qdrant                 ClusterIP   172.21.120.71    <none>        6333/TCP,6334/TCP,6335/TCP   8h
my-qdrant-headless        ClusterIP   None             <none>        6333/TCP,6334/TCP,6335/TCP   8h
SimFG commented 4 weeks ago

From the log, we can see that the reason why milvus cannot start is that it cannot connect to etcd.

[2024/06/19 02:37:06.433 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"EnableAuth\":false,\"UserName\":\"\",\"PassWord\":\"\",\"UseSSL\":false,\"Endpoints\":[\"my-milvus-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"]
[2024/06/19 02:37:06.434 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.435 +00:00] [INFO] [paramtable/base_table.go:209] ["init with etcd failed"] [error="context deadline exceeded"]
uniquejava commented 4 weeks ago

milvus is really nice, I am able to run it locally with docker, and integrate it with Spring AI (RAG) successfully, however when deploy to cluster environment, milvus standalone cannot start up. I am planning to migrate qdrant. Thus pod my-qdrant is running there

uniquejava commented 4 weeks ago

From the log, we can see that the reason why milvus cannot start is that it cannot connect to etcd.

[2024/06/19 02:37:06.433 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"EnableAuth\":false,\"UserName\":\"\",\"PassWord\":\"\",\"UseSSL\":false,\"Endpoints\":[\"my-milvus-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"]
[2024/06/19 02:37:06.434 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[my-milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/19 02:37:11.435 +00:00] [INFO] [paramtable/base_table.go:209] ["init with etcd failed"] [error="context deadline exceeded"]

Yes, I can see that. I use default settings for everything. How can I debug etcd's connectivity within cluster. I should be able to step into pod with oc debug pod/pod_id

ChatGPTing...

uniquejava commented 4 weeks ago

etcd logs

etcd 02:26:04.72 Welcome to the Bitnami etcd container
etcd 02:26:04.72 Subscribe to project updates by watching https://github.com/bitnami/containers
etcd 02:26:04.72 Submit issues and feature requests at https://github.com/bitnami/containers/issues
etcd 02:26:04.72 
etcd 02:26:04.72 INFO  ==> ** Starting etcd setup **
etcd 02:26:04.74 INFO  ==> Validating settings in ETCD_* env vars..
etcd 02:26:04.74 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 02:26:04.74 INFO  ==> Initializing etcd
etcd 02:26:04.75 INFO  ==> Generating etcd config file using env variables
etcd 02:26:04.77 WARN  ==> cluster size < 1
etcd 02:26:04.77 INFO  ==> get_etcd_active_endpoints: 
etcd 02:26:04.78 INFO  ==> There is no data from previous deployments
etcd 02:26:04.78 INFO  ==> ** etcd setup finished! **

etcd 02:26:04.80 INFO  ==> ** Starting etcd **
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ADVERTISE_CLIENT_URLS","variable-value":"http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2379"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_MODE","variable-value":"revision"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_RETENTION","variable-value":"1000"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_AUTO_TLS","variable-value":"false"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_CLIENT_CERT_AUTH","variable-value":"false"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_DATA_DIR","variable-value":"/bitnami/etcd/data"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ELECTION_TIMEOUT","variable-value":"2500"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_HEARTBEAT_INTERVAL","variable-value":"500"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ADVERTISE_PEER_URLS","variable-value":"http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_CLIENT_URLS","variable-value":"http://0.0.0.0:2379"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_PEER_URLS","variable-value":"http://0.0.0.0:2380"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LOG_LEVEL","variable-value":"info"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"my-milvus-etcd-0"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_AUTO_TLS","variable-value":"false"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_QUOTA_BACKEND_BYTES","variable-value":"4294967296"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_TRUSTED_CA_FILE="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CONF_FILE=/opt/bitnami/etcd/conf/etcd.yaml"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_ON_K8S=yes"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_SNAPSHOTS_DIR=/snapshots"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_BIN_DIR=/opt/bitnami/etcd/bin"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VOLUME_DIR=/bitnami/etcd"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_INITIAL_CLUSTER_TOKEN="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_DOMAIN="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_DISASTER_RECOVERY=no"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_KEY_FILE="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CONF_DIR=/opt/bitnami/etcd/conf"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_DAEMON_GROUP=etcd"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_START_FROM_SNAPSHOT=no"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_INIT_SNAPSHOT_FILENAME="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_INIT_SNAPSHOTS_DIR=/init-snapshot"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_TMP_DIR=/opt/bitnami/etcd/tmp"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_INITIAL_CLUSTER_STATE="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_BASE_DIR=/opt/bitnami/etcd"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_INITIAL_CLUSTER="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CERT_FILE="}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_NEW_MEMBERS_ENV_FILE=/bitnami/etcd/data/new_member_envs"}
{"level":"warn","ts":"2024-06-19T02:26:04.827Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_DAEMON_USER=etcd"}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd"]}
{"level":"info","ts":"2024-06-19T02:26:04.827Z","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
{"level":"info","ts":"2024-06-19T02:26:04.828Z","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
{"level":"info","ts":"2024-06-19T02:26:04.828Z","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.5","git-sha":"19002cfc6","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"my-milvus-etcd-0","data-dir":"/bitnami/etcd/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/bitnami/etcd/data/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"2.5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"my-milvus-etcd-0=http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":4294967296,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"revision","auto-compaction-retention":"1µs","auto-compaction-interval":"1µs","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2024-06-19T02:26:04.852Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/bitnami/etcd/data/member/snap/db","took":"22.976605ms"}
{"level":"info","ts":"2024-06-19T02:26:04.852Z","caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380","host":"my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380","resolved-addr":"172.17.86.136:2380"}
{"level":"info","ts":"2024-06-19T02:26:04.853Z","caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380","host":"my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380","resolved-addr":"172.17.86.136:2380"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","caller":"etcdserver/raft.go:494","msg":"starting local member","local-member-id":"e82ac3abef7cecf1","cluster-id":"e409097717fa8cc0"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 switched to configuration voters=()"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 became follower at term 0"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft e82ac3abef7cecf1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 became follower at term 1"}
{"level":"info","ts":"2024-06-19T02:26:04.861Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 switched to configuration voters=(16729398909045894385)"}
{"level":"warn","ts":"2024-06-19T02:26:04.864Z","caller":"auth/store.go:1233","msg":"simple token is not cryptographically signed"}
{"level":"info","ts":"2024-06-19T02:26:04.875Z","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":1}
{"level":"info","ts":"2024-06-19T02:26:04.876Z","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":4294967296,"quota-size":"4.3 GB"}
{"level":"info","ts":"2024-06-19T02:26:04.877Z","caller":"etcdserver/server.go:854","msg":"starting etcd server","local-member-id":"e82ac3abef7cecf1","local-server-version":"3.5.5","cluster-version":"to_be_decided"}
{"level":"info","ts":"2024-06-19T02:26:04.878Z","caller":"etcdserver/server.go:738","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"e82ac3abef7cecf1","forward-ticks":4,"forward-duration":"2s","election-ticks":5,"election-timeout":"2.5s"}
{"level":"info","ts":"2024-06-19T02:26:04.879Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 switched to configuration voters=(16729398909045894385)"}
{"level":"info","ts":"2024-06-19T02:26:04.879Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"e409097717fa8cc0","local-member-id":"e82ac3abef7cecf1","added-peer-id":"e82ac3abef7cecf1","added-peer-peer-urls":["http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380"]}
{"level":"info","ts":"2024-06-19T02:26:04.879Z","caller":"embed/etcd.go:275","msg":"now serving peer/client/metrics","local-member-id":"e82ac3abef7cecf1","initial-advertise-peer-urls":["http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[]}
{"level":"info","ts":"2024-06-19T02:26:04.879Z","caller":"embed/etcd.go:584","msg":"serving peer traffic","address":"[::]:2380"}
{"level":"info","ts":"2024-06-19T02:26:04.879Z","caller":"embed/etcd.go:556","msg":"cmux::serve","address":"[::]:2380"}
{"level":"info","ts":"2024-06-19T02:26:05.862Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 is starting a new election at term 1"}
{"level":"info","ts":"2024-06-19T02:26:05.862Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 became pre-candidate at term 1"}
{"level":"info","ts":"2024-06-19T02:26:05.863Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 received MsgPreVoteResp from e82ac3abef7cecf1 at term 1"}
{"level":"info","ts":"2024-06-19T02:26:05.863Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 became candidate at term 2"}
{"level":"info","ts":"2024-06-19T02:26:05.863Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 received MsgVoteResp from e82ac3abef7cecf1 at term 2"}
{"level":"info","ts":"2024-06-19T02:26:05.863Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"e82ac3abef7cecf1 became leader at term 2"}
{"level":"info","ts":"2024-06-19T02:26:05.863Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: e82ac3abef7cecf1 elected leader e82ac3abef7cecf1 at term 2"}
{"level":"info","ts":"2024-06-19T02:26:05.864Z","caller":"etcdserver/server.go:2563","msg":"setting up initial cluster version using v2 API","cluster-version":"3.5"}
{"level":"info","ts":"2024-06-19T02:26:05.865Z","caller":"etcdserver/server.go:2054","msg":"published local member to cluster through raft","local-member-id":"e82ac3abef7cecf1","local-member-attributes":"{Name:my-milvus-etcd-0 ClientURLs:[http://my-milvus-etcd-0.my-milvus-etcd-headless.smbc-mobile-devops.svc.cluster.local:2379]}","request-path":"/0/members/e82ac3abef7cecf1/attributes","cluster-id":"e409097717fa8cc0","publish-timeout":"10s"}
{"level":"info","ts":"2024-06-19T02:26:05.865Z","caller":"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2024-06-19T02:26:05.866Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2024-06-19T02:26:05.866Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
{"level":"info","ts":"2024-06-19T02:26:05.866Z","caller":"embed/serve.go:146","msg":"serving client traffic insecurely; this is strongly discouraged!","address":"[::]:2379"}
{"level":"info","ts":"2024-06-19T02:26:05.867Z","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"e409097717fa8cc0","local-member-id":"e82ac3abef7cecf1","cluster-version":"3.5"}
{"level":"info","ts":"2024-06-19T02:26:05.867Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2024-06-19T02:26:05.867Z","caller":"etcdserver/server.go:2587","msg":"cluster version is updated","cluster-version":"3.5"}
xiaofan-luan commented 4 weeks ago

seems that etcd is working, you can try to connect to etcd from milvus pod see fi the network is connected

LoveEachDay commented 4 weeks ago

@uniquejava Is there any networkpolicies setup in your cluster which will prevent service access? Could you try to try to deploy a pod(for example a ubuntu pod) in the same namespace to detect whether it can access etcd through the service name port like this:

nc -zv my-milvus-etcd:2379
uniquejava commented 4 weeks ago

Thank you, I can confirm it's etcd issue~, I tried all kinds of commands.


# debug etcd
$ oc debug pod/my-milvus-etcd-0
> etcdctl --endpoints=my-milvus-etcd:2379 member list
{"level":"warn","ts":"2024-06-19T03:00:58.628Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000dc1c0/my-milvus-etcd:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}

> etcdctl --endpoints=http://my-milvus-etcd-headless:2379 endpoint health

{"level":"warn","ts":"2024-06-19T05:25:56.679Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004421c0/my-milvus-etcd:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
http://my-milvus-etcd:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

oc run curl-1 --image=radial/busyboxplus:curl -i --tty --rm
> curl http://172.17.86.136:2379/health

> nslookup my-milvus-etcd
Server:    172.21.0.10
Address 1: 172.21.0.10 dns-default.openshift-dns.svc.cluster.local

Name:      my-milvus-etcd
Address 1: 172.21.30.159 my-milvus-etcd.smbc-mobile-devops.svc.cluster.local

curl --connect-timeout 3 -i http://my-milvus-etcd:2379
Failed to connect to ... after 3004 ms: Timeout was reached

I see from the etcd pod yaml seLinux is enabled, maybe I will have to disable SELinux or firewalld for the etcd pod.

The default securiytContext in OpenShift is complex, https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

If only there is a tutorial for running milvus hello world on OpenShift :( I will investigate this the day after tomorrow, for now, let me switch to qdrant, we have a LLM chat (with RAG) demo tomorrow afternoon~. Feel free to close this issue.

uniquejava commented 4 weeks ago

@LoveEachDay

AFAI, network policies only control services in different namespaces. I have one. but I think it's not relevant

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-pods
spec:
  # https://stackoverflow.com/questions/71647338/kubernetes-networkpolicy-multiple-match-labels
  podSelector: { } # This selects all pods in the namespace
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              projectName: some-other-project
uniquejava commented 4 weeks ago

seems that etcd is working, you can try to connect to etcd from milvus pod see fi the network is connected

Ah, I may need to login inside milvus pod, not etcd pod

uniquejava commented 4 weeks ago

😅 Turns out to be NetworkPolicy issue.