milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.17k stars 2.89k forks source link

[Bug]: QueryNode fail to start #24731

Closed dzqoo closed 1 year ago

dzqoo commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.9 
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Centos
- CPU/Memory: 6c32gb
- GPU: 
- Others:

Current Behavior

I compiled milvus2.2.9 in my env successfully.But When I deloy the cluster with my binary and lib, querynode failed to start while the other components success to start. I got log from querynodes in the following: `Welcome to use Milvus! Version:
Built: Wed Jun 7 05:57:47 UTC 2023 GitCommit: GoVersion: go version go1.18.8 linux/amd64

open pid file: /run/milvus/querynode.pid lock pid file: /run/milvus/querynode.pid [2023/06/08 01:46:31.748 +00:00] [INFO] [roles/roles.go:226] ["starting running Milvus components"] [2023/06/08 01:46:31.748 +00:00] [INFO] [roles/roles.go:152] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] [2023/06/08 01:46:31.748 +00:00] [INFO] [management/server.go:68] ["management listen"] [addr=:9091] [2023/06/08 01:46:31.769 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25] [2023/06/08 01:46:31.775 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"] [2023/06/08 01:46:31.775 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}] [2023/06/08 01:46:31.776 +00:00] [ERROR] [querynode/query_node.go:188] ["load queryhook failed"] [error="fail to set the querynode plugin path"] [stack="github.com/milvus-io/milvus/internal/querynode.NewQueryNode\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:188\ngithub.com/milvus-io/milvus/internal/distributed/querynode.NewServer\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:83\ngithub.com/milvus-io/milvus/cmd/components.NewQueryNode\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/components/query_node.go:40\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:110"] [2023/06/08 01:46:31.797 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.797 +00:00] [DEBUG] [paramtable/grpc_param.go:153] [initServerMaxSendSize] [role=querynode] [grpc.serverMaxSendSize=536870912] [2023/06/08 01:46:31.798 +00:00] [DEBUG] [paramtable/grpc_param.go:175] [initServerMaxRecvSize] [role=querynode] [grpc.serverMaxRecvSize=536870912] [2023/06/08 01:46:31.800 +00:00] [INFO] [querynode/service.go:106] [QueryNode] [port=21123] [2023/06/08 01:46:31.801 +00:00] [INFO] [querynode/service.go:122] ["QueryNode connect to etcd successfully"] [2023/06/08 01:46:31.902 +00:00] [INFO] [querynode/service.go:132] [QueryNode] [State=Initializing] [2023/06/08 01:46:31.902 +00:00] [INFO] [querynode/query_node.go:299] ["QueryNode session info"] [metaPath=by-dev/meta] [2023/06/08 01:46:31.902 +00:00] [INFO] [sessionutil/session_util.go:202] ["Session try to connect to etcd"] [2023/06/08 01:46:31.904 +00:00] [INFO] [sessionutil/session_util.go:217] ["Session connect to etcd success"] [2023/06/08 01:46:31.910 +00:00] [INFO] [sessionutil/session_util.go:300] ["Session get serverID success"] [key=id] [ServerId=411] [2023/06/08 01:46:31.929 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25] [2023/06/08 01:46:31.934 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"] [2023/06/08 01:46:31.934 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}] {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"logutil/logutil.go:165","message":"Log directory","configDir":""} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"logutil/logutil.go:166","message":"Set log file to ","path":""} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"querynode/query_node.go:209","message":"QueryNode init session","nodeID":411,"node address":"10.234.98.131:21123"} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"querynode/query_node.go:315","message":"QueryNode init rateCollector done","nodeID":411} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"storage/minio_chunk_manager.go:145","message":"minio chunk manager init success.","bucketname":"milvus-bucket","root":"file"} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"querynode/query_node.go:325","message":"queryNode try to connect etcd success","MetaRootPath":"by-dev/meta"} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"querynode/segment_loader.go:945","message":"SegmentLoader created","ioPoolSize":48,"cpuPoolSize":6} 2023-06-08 01:46:31,944 INFO [default] [KNOWHERE][SetBlasThreshold][milvus] Set faiss::distance_compute_blas_threshold to 16384 2023-06-08 01:46:31,945 INFO [default] [KNOWHERE][SetEarlyStopThreshold][milvus] Set faiss::early_stop_threshold to 0 2023-06-08 01:46:31,945 INFO [default] [KNOWHERE][SetStatisticsLevel][milvus] Set knowhere::STATISTICS_LEVEL to 0 2023-06-08 01:46:31,945 | DEBUG | default | [SERVER][operator()][milvus] Config easylogging with yaml file: /milvus/configs/easylogging.yaml 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SegcoreSetSimdType][milvus] set config simd_type: auto 2023-06-08 01:46:31,946 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS expect simdType::AUTO 2023-06-08 01:46:31,946 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS hook AVX512 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SetIndexSliceSize][milvus] set config index slice size(byte): 16777216 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SetThreadCoreCoefficient][milvus] set thread pool core coefficient: 10 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x372ed75]

runtime stack: runtime.throw({0x40010ba?, 0x7f85846297d0?}) /opt/go/goroot/src/runtime/panic.go:992 +0x71 runtime.sigpanic() /opt/go/goroot/src/runtime/signal_unix.go:802 +0x389

goroutine 207 [syscall]: runtime.cgocall(0x3144350, 0xc0012072c0) /opt/go/goroot/src/runtime/cgocall.go:157 +0x5c fp=0xc001207258 sp=0xc001207220 pc=0x14788bc github.com/milvus-io/milvus/internal/util/initcore._Cfunc_InitRemoteChunkManagerSingleton({0x7f85770ffa20, 0x7f8577028910, 0x7f8577028940, 0x7f8577028930, 0x7f85770085b0, 0x7f85770085c8, 0x7f85770085d0, 0x0, 0x0, {0x0, ...}}) _cgo_gotypes.go:122 +0x5b fp=0xc0012072c0 sp=0xc001207258 pc=0x2b21b3b github.com/milvus-io/milvus/internal/util/initcore.InitRemoteChunkManager(0x5d39640) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/util/initcore/init_storage_config.go:71 +0x2e5 fp=0xc0012073f8 sp=0xc0012072c0 pc=0x2b220e5 github.com/milvus-io/milvus/internal/querynode.(QueryNode).InitSegcore(0x4470038?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:291 +0x23e fp=0xc001207478 sp=0xc0012073f8 pc=0x2de447e github.com/milvus-io/milvus/internal/querynode.(QueryNode).Init.func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:346 +0x10e5 fp=0xc001207b50 sp=0xc001207478 pc=0x2de5865 sync.(Once).doSlow(0x3f91fe6?, 0x14b8051?) /opt/go/goroot/src/sync/once.go:68 +0xc2 fp=0xc001207bb0 sp=0xc001207b50 pc=0x14ef022 sync.(Once).Do(...) /opt/go/goroot/src/sync/once.go:59 github.com/milvus-io/milvus/internal/querynode.(QueryNode).Init(0x3f91fe6?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:297 +0x5b fp=0xc001207bf8 sp=0xc001207bb0 pc=0x2de473b github.com/milvus-io/milvus/internal/distributed/querynode.(Server).init(0xc000a32420) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:133 +0x76e fp=0xc001207ee8 sp=0xc001207bf8 pc=0x2fb25ce github.com/milvus-io/milvus/internal/distributed/querynode.(Server).Run(0xc000a34301?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:213 +0x25 fp=0xc001207f28 sp=0xc001207ee8 pc=0x2fb3925 github.com/milvus-io/milvus/cmd/components.(QueryNode).Run(0x5d39640?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/components/query_node.go:54 +0x1d fp=0xc001207f60 sp=0xc001207f28 pc=0x313015d github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:120 +0x182 fp=0xc001207fe0 sp=0xc001207f60 pc=0x3132d82 runtime.goexit() /opt/go/goroot/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc001207fe8 sp=0xc001207fe0 pc=0x14e1fc1 created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:104 +0x18a

goroutine 1 [chan receive]: github.com/milvus-io/milvus/cmd/roles.(MilvusRoles).Run(0xc0006c7e58, 0x0, {0x0, 0x0}) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:351 +0xb0d github.com/milvus-io/milvus/cmd/milvus.(run).execute(0xc000a2c4b0, {0xc00004e090?, 0x3, 0x3}, 0xc000a32240) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/milvus/run.go:112 +0x66e github.com/milvus-io/milvus/cmd/milvus.RunMilvus({0xc00004e090?, 0x3, 0x3}) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/milvus/milvus.go:60 +0x21e main.main() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/main.go:26 +0x2e

goroutine 220 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008bd420) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:137 +0x34a

goroutine 230 [IO wait]: internal/poll.runtime_pollWait(0x7f854e859028, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc0010ad000?, 0xc000062500?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Accept(0xc0010ad000) /opt/go/goroot/src/internal/poll/fd_unix.go:614 +0x22c net.(netFD).accept(0xc0010ad000) /opt/go/goroot/src/net/fd_unix.go:172 +0x35 net.(TCPListener).accept(0xc000c02408) /opt/go/goroot/src/net/tcpsock_posix.go:139 +0x28 net.(TCPListener).Accept(0xc000c02408) /opt/go/goroot/src/net/tcpsock.go:288 +0x3d net/http.(Server).Serve(0xc0008960e0, {0x4445be0, 0xc000c02408}) /opt/go/goroot/src/net/http/server.go:3039 +0x385 net/http.(Server).ListenAndServe(0xc0008960e0) /opt/go/goroot/src/net/http/server.go:2968 +0x7d net/http.ListenAndServe(...) /opt/go/goroot/src/net/http/server.go:3222 github.com/milvus-io/milvus/internal/management.ServeHTTP.func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/management/server.go:69 +0x151 created by github.com/milvus-io/milvus/internal/management.ServeHTTP /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/management/server.go:66 +0x25

goroutine 231 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0004f5680) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:73 +0x22a

goroutine 311 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc0009bd9f0, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc000a33380) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:413 +0x1f91

goroutine 204 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858f38, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc0005ec200?, 0xc0010ee000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc0005ec200, {0xc0010ee000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc0005ec200, {0xc0010ee000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc000472668, {0xc0010ee000?, 0x14c5000?, 0x801010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(Reader).Read(0xc000a32060, {0xc000896200, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a32060}, {0xc000896200, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc000896200?, 0x9?, 0x54b09a2?}, {0x442a600?, 0xc000a32060?}) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:237 +0x6e golang.org/x/net/http2.(Framer).ReadFrame(0xc0008961c0) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(http2Client).reader(0xc0000001e0) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:365 +0x193f

goroutine 234 [syscall]: os/signal.signal_recv() /opt/go/goroot/src/runtime/sigqueue.go:151 +0x2f os/signal.loop() /opt/go/goroot/src/os/signal/signal_unix.go:23 +0x19 created by os/signal.Notify.func1.1 /opt/go/goroot/src/os/signal/signal.go:151 +0x2a

goroutine 275 [select]: github.com/milvus-io/milvus/internal/config.(EtcdSource).refreshConfigurationsPeriodically(0xc00131ee80) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 205 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc0009bc230, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc0005f2780) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:413 +0x1f91

goroutine 206 [select]: github.com/milvus-io/milvus/internal/config.(EtcdSource).refreshConfigurationsPeriodically(0xc00091e180) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 208 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc000b0a740) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:73 +0x22a

goroutine 294 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858a88, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc000bb8080?, 0xc000aac000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc000bb8080, {0xc000aac000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc000bb8080, {0xc000aac000?, 0x3e081c0?, 0x14efd01?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc000c0e000, {0xc000aac000?, 0x0?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(Reader).Read(0xc0003ee240, {0xc0008963c0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc0003ee240}, {0xc0008963c0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0008963c0?, 0x9?, 0xdac9e0d?}, {0x442a600?, 0xc0003ee240?}) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:237 +0x6e golang.org/x/net/http2.(Framer).ReadFrame(0xc000896380) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(http2Client).reader(0xc0000005a0) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:365 +0x193f

goroutine 292 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc0009bd270, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc000a32e40) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:413 +0x1f91

goroutine 291 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858d58, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc00056e180?, 0xc0013be000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc00056e180, {0xc0013be000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc00056e180, {0xc0013be000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc0001b7fa8, {0xc0013be000?, 0x14c5000?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(Reader).Read(0xc000a32de0, {0xc0008962e0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a32de0}, {0xc0008962e0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0008962e0?, 0x9?, 0x6f1cae7?}, {0x442a600?, 0xc000a32de0?}) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:237 +0x6e golang.org/x/net/http2.(Framer).ReadFrame(0xc0008962a0) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(http2Client).reader(0xc0000003c0) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:365 +0x193f

goroutine 276 [select]: github.com/uber/jaeger-client-go.(RemotelyControlledSampler).pollControllerWithTicker(0xc0002e4d00, 0xc0009bd540) /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/sampler_remote.go:144 +0x89 github.com/uber/jaeger-client-go.(RemotelyControlledSampler).pollController(0xc0002e4d00) /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/sampler_remote.go:139 +0x6d created by github.com/uber/jaeger-client-go.NewRemotelyControlledSampler /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/sampler_remote.go:86 +0x15b

goroutine 279 [select]: github.com/uber/jaeger-client-go/utils.(*reconnectingUDPConn).reconnectLoop(0xc000868070, 0x0?) /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/utils/reconnecting_udp_conn.go:70 +0xbc created by github.com/uber/jaeger-client-go/utils.newReconnectingUDPConn /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/utils/reconnecting_udp_conn.go:60 +0x205

goroutine 280 [select]: github.com/uber/jaeger-client-go.(*remoteReporter).processQueue(0xc000b8f1a0) /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/reporter.go:296 +0xde created by github.com/uber/jaeger-client-go.NewRemoteReporter /opt/go/gopath/pkg/mod/github.com/uber/jaeger-client-go@v2.25.0+incompatible/reporter.go:237 +0x245

goroutine 281 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc00052dd80) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:73 +0x22a

goroutine 284 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858c68, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc0005eda80?, 0xc000614c00?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Accept(0xc0005eda80) /opt/go/goroot/src/internal/poll/fd_unix.go:614 +0x22c net.(netFD).accept(0xc0005eda80) /opt/go/goroot/src/net/fd_unix.go:172 +0x35 net.(TCPListener).accept(0xc0001ca3c0) /opt/go/goroot/src/net/tcpsock_posix.go:139 +0x28 net.(TCPListener).Accept(0xc0001ca3c0) /opt/go/goroot/src/net/tcpsock.go:288 +0x3d google.golang.org/grpc.(Server).Serve(0xc0008c9880, {0x4445be0?, 0xc0001ca3c0}) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:780 +0x477 github.com/milvus-io/milvus/internal/distributed/querynode.(Server).startGrpcLoop(0xc000a32420, 0x5283) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:203 +0x8ff created by github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:124 +0x5dd

goroutine 307 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc00091b8c0) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:73 +0x22a

goroutine 295 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc0010b2050, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc0003ee4e0) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:413 +0x1f91

goroutine 310 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858998, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc00088c180?, 0xc000976000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc00088c180, {0xc000976000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc00088c180, {0xc000976000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc000ba74d8, {0xc000976000?, 0x14c5000?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(Reader).Read(0xc000a33320, {0xc0004fe3c0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a33320}, {0xc0004fe3c0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0004fe3c0?, 0x9?, 0xeda36b9?}, {0x442a600?, 0xc000a33320?}) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:237 +0x6e golang.org/x/net/http2.(Framer).ReadFrame(0xc0004fe380) /opt/go/gopath/pkg/mod/golang.org/x/net@v0.9.0/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(http2Client).reader(0xc0005d9a40) /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:365 +0x193f

goroutine 296 [select]: github.com/milvus-io/milvus/internal/config.(EtcdSource).refreshConfigurationsPeriodically(0xc0006f1580) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 304 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008683f0) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:137 +0x34a

goroutine 302 [IO wait]: internal/poll.runtime_pollWait(0x7f854e8588a8, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc0006f1a00?, 0xc00107f000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc0006f1a00, {0xc00107f000, 0x1000, 0x1000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc0006f1a00, {0xc00107f000?, 0x0?, 0x4?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc000662070, {0xc00107f000?, 0x0?, 0x0?}) /opt/go/goroot/src/net/net.go:183 +0x45 net/http.(persistConn).Read(0xc001068360, {0xc00107f000?, 0x14c0b80?, 0xc000371ec8?}) /opt/go/goroot/src/net/http/transport.go:1929 +0x4e bufio.(Reader).fill(0xc0003ef7a0) /opt/go/goroot/src/bufio/bufio.go:106 +0x103 bufio.(Reader).Peek(0xc0003ef7a0, 0x1) /opt/go/goroot/src/bufio/bufio.go:144 +0x5d net/http.(persistConn).readLoop(0xc001068360) /opt/go/goroot/src/net/http/transport.go:2093 +0x1ac created by net/http.(Transport).dialConn /opt/go/goroot/src/net/http/transport.go:1750 +0x173e

goroutine 303 [select]: net/http.(persistConn).writeLoop(0xc001068360) /opt/go/goroot/src/net/http/transport.go:2392 +0xf5 created by net/http.(Transport).dialConn /opt/go/goroot/src/net/http/transport.go:1751 +0x1791

goroutine 305 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc000868460) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:137 +0x34a

goroutine 322 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008684d0) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:137 +0x34a

goroutine 323 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc000868540) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/pool.go:137 +0x34a

goroutine 324 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858b78, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc00091f300?, 0xc00121a000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(FD).Read(0xc00091f300, {0xc00121a000, 0x1000, 0x1000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(netFD).Read(0xc00091f300, {0xc00121a000?, 0xc00150a6e0?, 0x14ef4de?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc000662800, {0xc00121a000?, 0x7f85487294c0?, 0x1483220?}) /opt/go/goroot/src/net/net.go:183 +0x45 net/http.(connReader).Read(0xc001057c80, {0xc00121a000, 0x1000, 0x1000}) /opt/go/goroot/src/net/http/server.go:780 +0x16d bufio.(Reader).fill(0xc0003efe00) /opt/go/goroot/src/bufio/bufio.go:106 +0x103 bufio.(Reader).ReadSlice(0xc0003efe00, 0x0?) /opt/go/goroot/src/bufio/bufio.go:371 +0x2f bufio.(Reader).ReadLine(0xc0003efe00) /opt/go/goroot/src/bufio/bufio.go:400 +0x27 net/textproto.(Reader).readLineSlice(0xc0009171d0) /opt/go/goroot/src/net/textproto/reader.go:57 +0x99 net/textproto.(Reader).ReadLine(...) /opt/go/goroot/src/net/textproto/reader.go:38 net/http.readRequest(0xc000662800?) /opt/go/goroot/src/net/http/request.go:1029 +0x79 net/http.(conn).readRequest(0xc0010c4d20, {0x4447b10, 0xc0004f5cc0}) /opt/go/goroot/src/net/http/server.go:988 +0x24a net/http.(conn).serve(0xc0010c4d20, {0x4447bb8, 0xc0010be8a0}) /opt/go/goroot/src/net/http/server.go:1891 +0x32b created by net/http.(Server).Serve /opt/go/goroot/src/net/http/server.go:3071 +0x4db ` This seems to be a minio connecting issue.

the all commponents's status: image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

dzqoo commented 1 year ago

This is mine compiling env: image

congqixia commented 1 year ago

@dzqoo It looks like there is something went wrong when initializing the RemoteChunkManager. Could you please provide the configuration for the storage part (with sensitive field masked)?

dzqoo commented 1 year ago

this mine minio configs: ` existingSecret: "" bucketName: "milvus-bucket" rootPath: file useIAM: false iamEndpoint: "" podDisruptionBudget: enabled: false resources: requests: memory: 4Gi cpu: 1

gcsgateway: enabled: false replicas: 1 gcsKeyJson: "/etc/credentials/gcs_key.json" projectId: ""

service: type: NodePort port: 9000 nodePort: 31900

persistence: enabled: true existingClaim: "" storageClass: accessMode: ReadWriteOnce size: 500Gi

livenessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5

readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 5

startupProbe: enabled: true initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 60`

dzqoo commented 1 year ago

this mine minio configs: ` existingSecret: "" bucketName: "milvus-bucket" rootPath: file useIAM: false iamEndpoint: "" podDisruptionBudget: enabled: false resources: requests: memory: 4Gi cpu: 1

gcsgateway: enabled: false replicas: 1 gcsKeyJson: "/etc/credentials/gcs_key.json" projectId: ""

service: type: NodePort port: 9000 nodePort: 31900

persistence: enabled: true existingClaim: "" storageClass: accessMode: ReadWriteOnce size: 500Gi

livenessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5

readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 5

startupProbe: enabled: true initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 60`

Under the configs, the office's docker images started well....

LoveEachDay commented 1 year ago

Could you login to any milvus pods, paste the milvus.yaml under /milvus/configs directory?

dzqoo commented 1 year ago

Could you login to any milvus pods, paste the milvus.yaml under /milvus/configs directory?


etcd:
endpoints:
- localhost:2379
rootPath: by-dev # The root path where data is stored in etcd
metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
log:
# path is one of:
#  - "default" as os.Stderr,
#  - "stderr" as os.Stderr,
#  - "stdout" as os.Stdout,
#  - file path to append server logs to.
# please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
path: stdout
level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
use:
# please adjust in embedded Milvus: true
embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
data:
# Embedded Etcd only.
# please adjust in embedded Milvus: /tmp/milvus/etcdData/
dir: default.etcd
ssl:
enabled: false # Whether to support ETCD secure connection mode
tlsCert: /path/to/etcd-client.pem # path to your cert file
tlsKey: /path/to/etcd-client-key.pem # path to your key file
tlsCACert: /path/to/ca.pem # path to your CACert file
# TLS min version
# Optional values: 1.0, 1.1, 1.2, 1.3。
# We recommend using version 1.2 and above
tlsMinVersion: 1.3

Default value: etcd

Valid values: [etcd, mysql]

metastore: type: etcd

Related configuration of mysql, used to store Milvus metadata.

mysql: username: root password: 123456 address: localhost port: 3306 dbName: milvus_meta driverName: mysql maxOpenConns: 20 maxIdleConns: 5

please adjust in embedded Milvus: /tmp/milvus/data/

localStorage: path: /var/lib/milvus/data/

Related configuration of MinIO/S3/GCS or any other service supports S3 API, which is responsible for data persistence for Milvus.

We refer to the storage service as MinIO/S3 in the following description for simplicity.

minio: address: localhost # Address of MinIO/S3 port: 9000 # Port of MinIO/S3 accessKeyID: minioadmin # accessKeyID of MinIO/S3 secretAccessKey: minioadmin # MinIO/S3 encryption string useSSL: false # Access to MinIO/S3 with SSL bucketName: "a-bucket" # Bucket name in MinIO/S3 rootPath: files # The root path where the message is stored in MinIO/S3

Whether to use IAM role to access S3/GCS instead of access/secret keys

For more infomation, refer to

aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html

gcp: https://cloud.google.com/storage/docs/access-control/iam

aliyun (ack): https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-rrsa-to-enforce-access-control

aliyun (ecs): https://www.alibabacloud.com/help/en/elastic-compute-service/latest/attach-an-instance-ram-role

useIAM: false

Cloud Provider of S3. Supports: "aws", "gcp", "aliyun".

You can use "aws" for other cloud provider supports S3 API with signature v4, e.g.: minio

You can use "gcp" for other cloud provider supports S3 API with signature v2

You can use "aliyun" for other cloud provider uses virtual host style bucket

When useIAM enabled, only "aws", "gcp", "aliyun" is supported for now

cloudProvider: aws

Custom endpoint for fetch IAM role credentials. when useIAM is true & cloudProvider is "aws".

Leave it empty if you want to use AWS default endpoint

iamEndpoint: ""

Milvus supports three MQ: rocksmq(based on RockDB), Pulsar and Kafka, which should be reserved in config what you use.

There is a note about enabling priority if we config multiple mq in this file

1. standalone(local) mode: rockskmq(default) > Pulsar > Kafka

2. cluster mode: Pulsar(default) > Kafka (rocksmq is unsupported)

Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.

pulsar: address: localhost # Address of pulsar port: 6650 # Port of pulsar webport: 80 # Web port of pulsar, if you connect direcly without proxy, should use 8080 maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar. tenant: public namespace: default

If you want to enable kafka, needs to comment the pulsar configs

kafka: producer: client.id: dc consumer: client.id: dc1

brokerList: localhost1:9092,localhost2:9092,localhost3:9092

saslUsername: username

saslPassword: password

saslMechanisms: PLAIN

securityProtocol: SASL_SSL

rocksmq:

please adjust in embedded Milvus: /tmp/milvus/rdb_data

path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq rocksmqPageSize: 67108864 # 64 MB, 64 1024 1024 bytes, The size of each page of messages in rocksmq retentionTimeInMinutes: 4320 # 3 days, 3 24 60 minutes, The retention time of the message in rocksmq. retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq. compactionInterval: 86400 # 1 day, trigger rocksdb compaction every day to remove deleted data lrucacheratio: 0.06 # rocksdb cache memory ratio

Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests

rootCoord: address: localhost port: 53100 enableActiveStandby: false # Enable active-standby

dmlChannelNum: 16 # The number of dml channels created at system startup maxDatabaseNum: 64 # Maximum number of database maxPartitionNum: 4096 # Maximum number of partitions in a collection minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed

(in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).

Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go

importTaskExpiration: 900

(in seconds) Milvus will keep the record of import tasks for at least importTaskRetention seconds. Default 86400

seconds (24 hours).

Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go

importTaskRetention: 86400

Related configuration of proxy, used to validate client requests and reduce the returned results.

proxy: port: 19530 internalPort: 19529 http: enabled: true # Whether to enable the http server debug_mode: false # Whether to enable http server debug mode

timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick msgStream: timeTick: bufSize: 512 maxNameLength: 255 # Maximum length of name for a collection or alias maxFieldNum: 64 # Maximum number of fields in a collection.

As of today (2.2.0 and after) it is strongly DISCOURAGED to set maxFieldNum >= 64.

So adjust at your risk!

maxDimension: 32768 # Maximum dimension of a vector

It's strongly DISCOURAGED to set maxShardNum > 64.

maxShardNum: 16 # Maximum number of shards in a collection maxTaskNum: 1024 # max task number of proxy task queue

please adjust in embedded Milvus: false

ginLogging: true # Whether to produce gin logs. grpc: serverMaxRecvSize: 67108864 # 64M serverMaxSendSize: 67108864 # 64M clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.

queryCoord: address: localhost port: 19531 autoHandoff: true # Enable auto handoff autoBalance: true # Enable auto balance balancer: ScoreBasedBalancer # Balancer to use globalRowCountFactor: 0.1 # expert parameters, only used by scoreBasedBalancer scoreUnbalanceTolerationFactor: 0.05 # expert parameters, only used by scoreBasedBalancer reverseUnBalanceTolerationFactor: 1.3 #expert parameters, only used by scoreBasedBalancer overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload balanceIntervalSeconds: 60 memoryUsageMaxDifferencePercentage: 30 checkInterval: 10000 channelTaskTimeout: 60000 # 1 minute segmentTaskTimeout: 120000 # 2 minute distPullInterval: 500 loadTimeoutSeconds: 1800 checkHandoffInterval: 5000 taskMergeCap: 8 taskExecutionCap: 256 enableActiveStandby: false # Enable active-standby refreshTargetsIntervalSeconds: 300

Related configuration of queryNode, used to run hybrid search between vector and scalar data.

queryNode: cacheSize: 32 # GB, default 32 GB, cacheSize is the memory used for caching data for faster query. The cacheSize must be less than system memory size. port: 21123 loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments enableDisk: true # enable querynode load disk index, and search on disk index maxDiskUsagePercentage: 95 gracefulStopTimeout: 30

stats: publishInterval: 1000 # Interval for querynode to report node information (milliseconds) dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph

Segcore will divide a segment into multiple chunks to enbale small index

segcore: chunkRows: 1024 # The number of vectors in a chunk. knowhereThreadPoolNumRatio: 4 # Use more threads to make good use of SSD throughput

Note: we have disabled segment small index since @2022.05.12. So below related configurations won't work.

# We won't create small index for growing segments and search on these segments will directly use bruteforce scan.
smallIndex:
  nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
  nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist

cache: enabled: true memoryLimit: 2147483648 # 2 GB, 2 1024 1024 *1024

scheduler: receiveChanSize: 10240 unsolvedQueueSize: 10240

maxReadConcurrentRatio is the concurrency ratio of read task (search task and query task).

# Max read concurrency would be the value of `runtime.NumCPU * maxReadConcurrentRatio`.
# It defaults to 2.0, which means max read concurrency would be the value of runtime.NumCPU * 2.
# Max read concurrency must greater than or equal to 1, and less than or equal to runtime.NumCPU * 100.
maxReadConcurrentRatio: 2.0 # (0, 100]
cpuRatio: 10.0 # ratio used to estimate read task cpu usage.
# maxTimestampLag is the max ts lag between serviceable and guarantee timestamp.
# if the lag is larger than this config, scheduler will return error without waiting.
# the valid value is [3600, infinite)
maxTimestampLag: 86400
# read task schedule policy: fifo(by default), user-task-polling.
scheduleReadPolicy: 
  # fifo: A FIFO queue support the schedule.
  # user-task-polling: 
  #     The user's tasks will be polled one by one and scheduled. 
  #     Scheduling is fair on task granularity.
  #     The policy is based on the username for authentication.
  #     And an empty username is considered the same user. 
  #     When there are no multi-users, the policy decay into FIFO
  name: fifo
  # user-task-polling configure:
  taskQueueExpire: 60 # 1 min by default, expire time of inner user task queue since queue is empty.

grouping: enabled: true maxNQ: 50000 topKMergeRatio: 10.0

indexCoord: address: localhost port: 31000 enableActiveStandby: false # Enable active-standby

minSegmentNumRowsToEnableIndex: 1024 # It's a threshold. When the segment num rows is less than this value, the segment will not be indexed

bindIndexNodeMode: enable: false address: "localhost:22930" withCred: false nodeID: 0

gc: interval: 600 # gc interval in seconds

scheduler: interval: 1000 # scheduler interval in Millisecond

indexNode: port: 21121 enableDisk: true # enable index node build disk vector index maxDiskUsagePercentage: 95 gracefulStopTimeout: 30

scheduler: buildParallel: 1

dataCoord: address: localhost port: 13333 enableCompaction: true # Enable data segment compaction enableGarbageCollection: true enableActiveStandby: false # Enable active-standby

channel: watchTimeoutInterval: 120 # Timeout on watching channels (in seconds). Datanode tickler update watch progress will reset timeout timer. balanceSilentDuration: 300 # The duration before the channelBalancer on datacoord to run balanceInterval: 360 #The interval for the channelBalancer on datacoord to check balance status

segment: maxSize: 512 # Maximum size of a segment in MB diskSegmentMaxSize: 2048 # Maximun size of a segment in MB for collection which has Disk index

Minimum proportion for a segment which can be sealed.

# Sealing early can prevent producing large growing segments in case these segments might slow down our search/query.
# Segments that sealed early will be compacted into a larger segment (within maxSize) eventually.
sealProportion: 0.23
assignmentExpiration: 2000 # The time of the assignment expiration in ms
maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60
# If a segment didn't accept dml records in `maxIdleTime` and the size of segment is greater than
# `minSizeFromIdleToSealed`, Milvus will automatically seal it.
maxIdleTime: 600 # The max idle time of segment in seconds, 10*60.
minSizeFromIdleToSealed: 16 # The min size in MB of segment which can be idle from sealed.
# The max number of binlog file for one segment, the segment will be sealed if
# the number of binlog file reaches to max value.
maxBinlogFileNumber: 32
smallProportion: 0.5 # The segment is considered as "small segment" when its # of rows is smaller than
# (smallProportion * segment max # of rows).
compactableProportion: 0.85 # A compaction will happen on small segments if the segment after compaction will have
# over (compactableProportion * segment max # of rows) rows.
# MUST BE GREATER THAN OR EQUAL TO <smallProportion>!!!
expansionRate: 1.25 # During compaction, the size of segment # of rows is able to exceed segment max # of rows by (expansionRate-1) * 100%.

compaction: enableAutoCompaction: true

gc: interval: 3600 # gc interval in seconds missingTolerance: 86400 # file meta missing tolerance duration in seconds, 60*24 dropTolerance: 3600 # file belongs to dropped entity tolerance duration in seconds

dataNode: port: 21124

dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph segment:

Max buffer size to flush for a single segment.

insertBufSize: 16777216 # Bytes, 16 MB
# Max buffer size to flush del for a single channel
deleteBufBytes: 67108864 # Bytes, 64MB
# The period to sync segments if buffer is not empty.
syncPeriod: 600 # Seconds, 10min

memory: forceSyncEnable: true # true to force sync if memory usage is too high forceSyncSegmentNum: 1 # number of segments to sync, segments with top largest buffer will be synced. watermarkStandalone: 0.2 # memory watermark for standalone, upon reaching this watermark, segments will be synced. watermarkCluster: 0.5 # memory watermark for cluster, upon reaching this watermark, segments will be synced.

Configures the system log output.

log: level: debug # Only supports debug, info, warn, error, panic, or fatal. Default 'info'. stdout: "true" # default true, print log to stdout file:

please adjust in embedded Milvus: /tmp/milvus/logs

rootPath: "" # root dir path to put logs, default "" means no log file will print
maxSize: 300 # MB
maxAge: 10 # Maximum time for log retention in day.
maxBackups: 20

format: text # text/json

grpc: log: level: WARNING

serverMaxRecvSize: 536870912 # 512MB serverMaxSendSize: 536870912 # 512MB clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

client: dialTimeout: 200 keepAliveTime: 10000 keepAliveTimeout: 20000 maxMaxAttempts: 5 initialBackOff: 1.0 maxBackoff: 60.0 backoffMultiplier: 2.0 server: retryTimes: 5 # retry times when receiving a grpc return value with a failure and retryable state code

Configure the proxy tls enable.

tls: serverPemPath: configs/cert/server.pem serverKeyPath: configs/cert/server.key caPemPath: configs/cert/ca.pem

common:

Channel name generation rule: ${namePrefix}-${ChannelIdx}

chanNamePrefix: cluster: "by-dev" rootCoordTimeTick: "rootcoord-timetick" rootCoordStatistics: "rootcoord-statistics" rootCoordDml: "rootcoord-dml" rootCoordDelta: "rootcoord-delta" search: "search" searchResult: "searchResult" queryTimeTick: "queryTimeTick" queryNodeStats: "query-node-stats"

Cmd for loadIndex, flush, etc...

cmd: "cmd"
dataCoordStatistic: "datacoord-statistics-channel"
dataCoordTimeTick: "datacoord-timetick-channel"
dataCoordSegmentInfo: "segment-info-channel"

Sub name generation rule: ${subNamePrefix}-${NodeID}

subNamePrefix: rootCoordSubNamePrefix: "rootCoord" proxySubNamePrefix: "proxy" queryNodeSubNamePrefix: "queryNode" dataNodeSubNamePrefix: "dataNode" dataCoordSubNamePrefix: "dataCoord"

defaultPartitionName: "_default" # default partition name for a collection defaultIndexName: "_default_idx" # default index name retentionDuration: 0 # time travel reserved time, insert/delete will not be cleaned in this period. disable it by default entityExpiration: -1 # Entity expiration in seconds, CAUTION make sure entityExpiration >= retentionDuration and -1 means never expire

gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency. gracefulStopTimeout: 30 # seconds. it will force quit the server if the graceful stop process is not completed during this time.

Default value: auto

Valid values: [auto, avx512, avx2, avx, sse4_2]

This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.

simdType: auto indexSliceSize: 16 # MB DiskIndex: MaxDegree: 56 SearchListSize: 100 PQCodeBudgetGBRatio: 0.125 BuildNumThreadsRatio: 1.0 SearchCacheBudgetGBRatio: 0.10 LoadNumThreadRatio: 8.0 BeamWidthRatio: 4.0

This parameter specify how many times the number of threads is the number of cores

threadCoreCoefficient : 10

please adjust in embedded Milvus: local

storageType: minio

security: authorizationEnabled: false

The superusers will ignore some system check processes,

# like the old password verification when updating the credential
# superUsers:
#  - "root"
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0

session: ttl: 20 # ttl value when session granting a lease to register service retryTimes: 30 # retry times when session sending etcd requests

ImportMaxFileSize: 17179869184 # 16 1024 1024 * 1024

max file size to import for bulkInsert

QuotaConfig, configurations of Milvus quota and limits.

By default, we enable:

1. TT protection;

2. Memory protection.

3. Disk quota protection.

You can enable:

1. DML throughput limitation;

2. DDL, DQL qps/rps limitation;

3. DQL Queue length/latency protection;

4. DQL result rate protection;

If necessary, you can also manually force to deny RW requests.

quotaAndLimits: enabled: true # true to enable quota and limits, false to disable. limits: maxCollectionNum: 65536 maxCollectionNumPerDB: 65536

quotaCenterCollectInterval is the time interval that quotaCenter

collects metrics from Proxies, Query cluster and Data cluster.

quotaCenterCollectInterval: 3 # seconds, (0 ~ 65536)

ddl: # ddl limit rates, default no limit. enabled: false collectionRate: -1 # qps, default no limit, rate for CreateCollection, DropCollection, LoadCollection, ReleaseCollection partitionRate: -1 # qps, default no limit, rate for CreatePartition, DropPartition, LoadPartition, ReleasePartition

indexRate: enabled: false max: -1 # qps, default no limit, rate for CreateIndex, DropIndex flushRate: enabled: false max: -1 # qps, default no limit, rate for flush compactionRate: enabled: false max: -1 # qps, default no limit, rate for manualCompaction

dml limit rates, default no limit.

The maximum rate will not be greater than max.

dml: enabled: false insertRate: collection: max: -1 # MB/s, default no limit max: -1 # MB/s, default no limit deleteRate: collection: max: -1 # MB/s, default no limit max: -1 # MB/s, default no limit bulkLoadRate: # not support yet. TODO: limit bulkLoad rate collection: max: -1 # MB/s, default no limit max: -1 # MB/s, default no limit

dql limit rates, default no limit.

The maximum rate will not be greater than max.

dql: enabled: false searchRate: collection: max: -1 # vps (vectors per second), default no limit max: -1 # vps (vectors per second), default no limit queryRate: collection: max: -1 # qps, default no limit max: -1 # qps, default no limit

limitWriting decides whether dml requests are allowed.

limitWriting:

forceDeny false means dml requests are allowed (except for some

# specific conditions, such as memory of nodes to water marker), `true` means always reject all dml requests.
forceDeny: false
ttProtection:
  enabled: false
  # maxTimeTickDelay indicates the backpressure for DML Operations.
  # DML rates would be reduced according to the ratio of time tick delay to maxTimeTickDelay,
  # if time tick delay is greater than maxTimeTickDelay, all DML requests would be rejected.
  maxTimeTickDelay: 300 # in seconds
memProtection:
  enabled: true
  # When memory usage > memoryHighWaterLevel, all dml requests would be rejected;
  # When memoryLowWaterLevel < memory usage < memoryHighWaterLevel, reduce the dml rate;
  # When memory usage < memoryLowWaterLevel, no action.
  # memoryLowWaterLevel should be less than memoryHighWaterLevel.
  dataNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in DataNodes
  dataNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in DataNodes
  queryNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in QueryNodes
  queryNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in QueryNodes
growingSegmentsSizeProtection:
  # 1. No action will be taken if the ratio of growing segments size is less than the low water level.
  # 2. The DML rate will be reduced if the ratio of growing segments size is greater than the low water level and less than the high water level.
  # 3. All DML requests will be rejected if the ratio of growing segments size is greater than the high water level.
  enabled: false
  lowWaterLevel: 0.2
  highWaterLevel: 0.4
diskProtection:
  # When the total file size of object storage is greater than `diskQuota`, all dml requests would be rejected;
  enabled: true
  diskQuota: -1 # MB, (0, +inf), default no limit
  diskQuotaPerCollection: -1 # MB, (0, +inf), default no limit

limitReading decides whether dql requests are allowed.

limitReading:

forceDeny false means dql requests are allowed (except for some

# specific conditions, such as collection has been dropped), `true` means always reject all dql requests.
forceDeny: false
queueProtection:
  enabled: false
  # nqInQueueThreshold indicated that the system was under backpressure for Search/Query path.
  # If NQ in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates would gradually cool off
  # until the NQ in queue no longer exceeds nqInQueueThreshold. We think of the NQ of query request as 1.
  nqInQueueThreshold: -1 # int, default no limit

  # queueLatencyThreshold indicated that the system was under backpressure for Search/Query path.
  # If dql latency of queuing is greater than queueLatencyThreshold, search&query rates would gradually cool off
  # until the latency of queuing no longer exceeds queueLatencyThreshold.
  # The latency here refers to the averaged latency over a period of time.
  queueLatencyThreshold: -1 # milliseconds, default no limit
resultProtection:
  enabled: false
  # maxReadResultRate indicated that the system was under backpressure for Search/Query path.
  # If dql result rate is greater than maxReadResultRate, search&query rates would gradually cool off
  # until the read result rate no longer exceeds maxReadResultRate.
  maxReadResultRate: -1 # MB/s, default no limit
# coolOffSpeed is the speed of search&query rates cool off.
coolOffSpeed: 0.9 # (0, 1]

autoIndex: params: build: '{"M": 30,"efConstruction": 360,"index_type": "HNSW", "metric_type": "IP"}'

yanliang567 commented 1 year ago

/assign @congqixia /unassign

mrrtree commented 1 year ago

I also encountered this panic problem, which is also a mirror image of centos. Is there any progress on this issue now?

@yanliang567

mrrtree commented 1 year ago

image

yanliang567 commented 1 year ago

@congqixia any ideas?

mrrtree commented 1 year ago

image

congqixia commented 1 year ago

@mrrtree Sorry for the late reply Quick question, which OSS service did you use when you encounter this problem?

mrrtree commented 1 year ago

@mrrtree Sorry for the late reply Quick question, which OSS service did you use when you encounter this problem?

minio。i guess the problem is openssl version, which is 1.0.2 in cenos7

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.