[Bug]: Milvus v2.4.10 failed to search: loaded collection do not found any channel in target, may be in recovery

shivabohemian commented 2 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: 2.4.10
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq 
- SDK version(e.g. pymilvus v2.0.0rc2): milvus-sdk-go 2.4.1
- OS(Ubuntu or CentOS): Debian 12
- CPU/Memory: 8G
- GPU: 
- Others:

Current Behavior

Search error. It looks like the service has exited unexpectedly and restarted. failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452203396426891450]

Expected Behavior

Search normal.

Steps To Reproduce

1. A new data milvus.
2. Keep the default Settings and enable mmap.
3. Create the IVF_SQ8 index.
4. Search.

Milvus Log

8月 30 20:05:18 testserver systemd[1]: milvus.service: Main process exited, code=killed, status=4/ILL
8月 30 20:05:18 testserver systemd[1]: milvus.service: Failed with result 'signal'.
8月 30 20:05:18 testserver systemd[1]: milvus.service: Consumed 6.731s CPU time.
8月 30 20:05:18 testserver systemd[1]: milvus.service: Scheduled restart job, restart counter is at 5.
8月 30 20:05:18 testserver systemd[1]: Stopped milvus.service - Milvus Standalone Server.
8月 30 20:05:18 testserver systemd[1]: milvus.service: Consumed 6.731s CPU time.
8月 30 20:05:18 testserver systemd[1]: Started milvus.service - Milvus Standalone Server.
8月 30 20:05:19 testserver milvus[627838]: 2024/08/30 20:05:19 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
8月 30 20:05:19 testserver milvus[627838]:     __  _________ _   ____  ______
8月 30 20:05:19 testserver milvus[627838]:    /  |/  /  _/ /| | / / / / / __/
8月 30 20:05:19 testserver milvus[627838]:   / /|_/ // // /_| |/ / /_/ /\ \
8月 30 20:05:19 testserver milvus[627838]:  /_/  /_/___/____/___/\____/___/
8月 30 20:05:19 testserver milvus[627838]: Welcome to use Milvus!
8月 30 20:05:19 testserver milvus[627838]: Version:   v2.4.10
8月 30 20:05:19 testserver milvus[627838]: Built:     Fri Aug 30 06:25:02 UTC 2024
8月 30 20:05:19 testserver milvus[627838]: GitCommit: a1d3932
8月 30 20:05:19 testserver milvus[627838]: GoVersion: go version go1.21.11 linux/amd64
8月 30 20:05:19 testserver milvus[627838]: TotalMem: 8066011136
8月 30 20:05:19 testserver milvus[627838]: UsedMem: 50425856
8月 30 20:05:19 testserver milvus[627838]: open pid file: /run/milvus/standalone.pid
8月 30 20:05:19 testserver milvus[627838]: lock pid file: /run/milvus/standalone.pid
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.096 +08:00] [INFO] [roles/roles.go:328] ["starting running Milvus components"]
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.096 +08:00] [INFO] [roles/roles.go:177] ["Enable Jemalloc"] ["Jemalloc Path"=/usr/lib/milvus/libjemalloc.so]
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.105 +08:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.107 +08:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.107 +08:00] [INFO] [paramtable/hook_config.go:21] ["hook config"] [hook={}]
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.107+0800","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.108+0800","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.108+0800","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.5","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.21.11","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"default","data-dir":"/userdata/siyouyun/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/userdata/siyouyun/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://0.0.0.0:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":4294967296,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"revision","auto-compaction-retention":"1µs","auto-compaction-interval":"1µs","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.109+0800","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/userdata/siyouyun/etcd/member/snap/db","took":"436.659µs"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.110+0800","caller":"etcdserver/server.go:530","msg":"No snapshot found. Recovering WAL from scratch!"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.125+0800","caller":"etcdserver/raft.go:529","msg":"restarting local member","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","commit-index":562}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.125+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d switched to configuration voters=()"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.125+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d became follower at term 7"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.125+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 8e9e05c52164694d [peers: [], term: 7, commit: 562, applied: 0, lastindex: 562, lastterm: 7]"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:19.127+0800","caller":"auth/store.go:1233","msg":"simple token is not cryptographically signed"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.128+0800","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":388}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.130+0800","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":4294967296,"quota-size":"4.3 GB"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.130+0800","caller":"etcdserver/server.go:854","msg":"starting etcd server","local-member-id":"8e9e05c52164694d","local-server-version":"3.5.5","cluster-version":"to_be_decided"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.131+0800","caller":"etcdserver/server.go:754","msg":"starting initial election tick advance","election-ticks":10}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.131+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d switched to configuration voters=(10276657743932975437)"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.131+0800","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.131+0800","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","cluster-version":"3.5"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.131+0800","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.134+0800","caller":"embed/etcd.go:584","msg":"serving peer traffic","address":"127.0.0.1:2380"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.134+0800","caller":"embed/etcd.go:556","msg":"cmux::serve","address":"127.0.0.1:2380"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.134+0800","caller":"embed/etcd.go:275","msg":"now serving peer/client/metrics","local-member-id":"8e9e05c52164694d","initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://0.0.0.0:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[]}
8月 30 20:05:19 testserver milvus[627838]: [2024/08/30 20:05:19.134 +08:00] [INFO] [etcd/etcd_server.go:58] ["finish init Etcd config"] [path=/siyouyun/app/milvus/configs/embedEtcd.yaml] [data=/userdata/siyouyun/etcd]
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.237+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d no leader at term 7; dropping index reading msg"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:19.737+0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":7587881077209339394,"retry-timeout":"500ms"}
8月 30 20:05:19 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:19.738+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d no leader at term 7; dropping index reading msg"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:20.238+0800","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":7587881077209339394,"retry-timeout":"500ms"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.238+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d no leader at term 7; dropping index reading msg"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d is starting a new election at term 7"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d became pre-candidate at term 7"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d received MsgPreVoteResp from 8e9e05c52164694d at term 7"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d became candidate at term 8"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 8"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.726+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"8e9e05c52164694d became leader at term 8"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.727+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 8"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:20.728+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.490223837s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"health\" ","response":"","error":"etcdserver: leader changed"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.728+0800","caller":"traceutil/trace.go:171","msg":"trace[1401188907] range","detail":"{range_begin:health; range_end:; }","duration":"1.491528077s","start":"2024-08-30T20:05:19.237+0800","end":"2024-08-30T20:05:20.728+0800","steps":["trace[1401188907] 'agreement among raft nodes before linearized reading'  (duration: 1.490221341s)"],"step_count":1}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.732+0800","caller":"etcdserver/server.go:2054","msg":"published local member to cluster through raft","local-member-id":"8e9e05c52164694d","local-member-attributes":"{Name:default ClientURLs:[http://0.0.0.0:2379]}","request-path":"/0/members/8e9e05c52164694d/attributes","cluster-id":"cdf818194e3a8c32","publish-timeout":"7s"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:20.732+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.495579724s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"health\" ","response":"range_response_count:0 size:5"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.733+0800","caller":"embed/serve.go:100","msg":"ready to serve client requests"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.733+0800","caller":"traceutil/trace.go:171","msg":"trace[574210340] range","detail":"{range_begin:health; range_end:; response_count:0; response_revision:388; }","duration":"1.495742427s","start":"2024-08-30T20:05:19.237+0800","end":"2024-08-30T20:05:20.733+0800","steps":["trace[574210340] 'agreement among raft nodes before linearized reading'  (duration: 1.495496121s)"],"step_count":1}
8月 30 20:05:20 testserver milvus[627838]: {"level":"warn","ts":"2024-08-30T20:05:20.733+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.30183791s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"health\" ","response":"range_response_count:0 size:5"}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.733+0800","caller":"traceutil/trace.go:171","msg":"trace[1975943952] range","detail":"{range_begin:health; range_end:; response_count:0; response_revision:388; }","duration":"1.3024913s","start":"2024-08-30T20:05:19.431+0800","end":"2024-08-30T20:05:20.733+0800","steps":["trace[1975943952] 'agreement among raft nodes before linearized reading'  (duration: 1.301807436s)"],"step_count":1}
8月 30 20:05:20 testserver milvus[627838]: {"level":"info","ts":"2024-08-30T20:05:20.735+0800","caller":"embed/serve.go:146","msg":"serving client traffic insecurely; this is strongly discouraged!","address":"[::]:2379"}
8月 30 20:05:27 testserver milvus[627838]: ---Milvus Proxy successfully initialized and ready to serve!---
8月 30 20:05:27 testserver milvus[627838]: WARNING: Logging before InitGoogleLogging() is written to STDERR
8月 30 20:05:27 testserver milvus[627838]: I20240830 20:05:27.248490 627848 knowhere_config.cc:107] [KNOWHERE][SetBlasThreshold][milvus] Set faiss::distance_compute_blas_threshold to 16384
8月 30 20:05:27 testserver milvus[627838]: I20240830 20:05:27.248790 627848 knowhere_config.cc:118] [KNOWHERE][SetEarlyStopThreshold][milvus] Set faiss::early_stop_threshold to 0
8月 30 20:05:27 testserver milvus[627838]: I20240830 20:05:27.248929 627848 knowhere_config.cc:54] [KNOWHERE][ShowVersion][milvus] Knowhere Version: v2.3.8

Anything else?

No response

xiaofan-luan commented 2 months ago

could you offer our full logs to investigate.

https://github.com/milvus-io/milvus/tree/master/deployments/export-log

From your logs you need to check about your disk

{"level":"warn","ts":"2024-08-30T20:05:20.732+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.495579724s

etcd takes 1.5s to apply request, which is too long. Did you make sure you deploy milvus on a ssd disk?

yanliang567 commented 2 months ago

/assign @shivabohemian /unassign

shivabohemian commented 2 months ago

I put etcd on emmc. This is the log of milvus from startup to query crash.[output.log] On August 31 at 11:22:21 I ran a vector query and it crashed. The sdk reported the same error as mentioned in the issue, the milvus log did not have a panic message, but it did crash and restart.

shivabohemian commented 2 months ago

By the way, I packaged the docker image v2.4.10 into a deb package. It seems that the libraries in the lib folder of the image do not have symbolic links, resulting in some libraries being duplicated, which increases the size of the image.

48M libblob-chunk-manager.so
80K libdouble-conversion.so
80K libdouble-conversion.so.3
80K libdouble-conversion.so.3.2.0
256K    libevent_core-2.1.so
256K    libevent_core-2.1.so.7
256K    libevent_core-2.1.so.7.0.1
256K    libevent_core.so
172K    libevent_extra-2.1.so
172K    libevent_extra-2.1.so.7
172K    libevent_extra-2.1.so.7.0.1
172K    libevent_extra.so
6.5M    libevent_openssl-2.1.so
6.5M    libevent_openssl-2.1.so.7
6.5M    libevent_openssl-2.1.so.7.0.1
6.5M    libevent_openssl.so
20K libevent_pthreads-2.1.so
20K libevent_pthreads-2.1.so.7
20K libevent_pthreads-2.1.so.7.0.1
20K libevent_pthreads.so
113M    libfolly.so
113M    libfolly.so.0.58.0-dev
724K    libfolly_exception_counter.so
724K    libfolly_exception_counter.so.0.58.0-dev
600K    libfolly_exception_tracer.so
600K    libfolly_exception_tracer.so.0.58.0-dev
6.8M    libfolly_exception_tracer_base.so
6.8M    libfolly_exception_tracer_base.so.0.58.0-dev
8.4M    libfolly_test_util.so
8.4M    libfolly_test_util.so.0.58.0-dev
11M libfollybenchmark.so
11M libfollybenchmark.so.0.58.0-dev
204K    libgflags_nothreads.so
204K    libgflags_nothreads.so.2.2
204K    libgflags_nothreads.so.2.2.2
252K    libglog.so
252K    libglog.so.0.6.0
252K    libglog.so.1
384K    libhwloc.so
384K    libhwloc.so.15
384K    libhwloc.so.15.6.4
4.3M    libjemalloc.so
4.3M    libjemalloc.so.2
278M    libknowhere.so
434M    libmilvus_core.so
236K    librdkafka++.so
236K    librdkafka++.so.1
9.9M    librdkafka.so
9.9M    librdkafka.so.1
13M librocksdb.so
13M librocksdb.so.6
13M librocksdb.so.6.29.5
336K    libtbb.so
336K    libtbb.so.12
336K    libtbb.so.12.9
28K libtbbbind_2_5.so
28K libtbbbind_2_5.so.3
28K libtbbbind_2_5.so.3.9

xiaofan-luan commented 2 months ago

you need to check the speed of your emmc.

Milvus usuaully got panic when etcd is too slow.

set common.session.ttl to longer might help a little bit but we need ssd here

shivabohemian commented 2 months ago

Did we make this change? It worked fine until v2.4.6

shivabohemian commented 2 months ago

I started the binary package directly and saw an error with an Illegal instruction.

[2024/09/01 17:58:26.629 +08:00] [INFO] [proxy/meta_cache.go:493] ["meta update success"] [database=default] [collectionName=img_feature] [collectionID=452217503082086433]
[2024/09/01 17:58:26.629 +08:00] [INFO] [querycoordv2/services.go:136] ["show partitions request received"] [traceID=a52b57943dd2a8a2b69be49b28ff9be3] [collectionID=452217503082086433] [partitions="[452217503082088626]"]
[2024/09/01 17:58:26.630 +08:00] [INFO] [rootcoord/root_coord.go:2811] ["received request to describe database "] [traceID=715635b06f7b2b6091a5e736caf34cc8] [dbName=default]
[2024/09/01 17:58:26.630 +08:00] [INFO] [rootcoord/root_coord.go:2835] ["done to describe database"] [traceID=715635b06f7b2b6091a5e736caf34cc8] [dbName=default] [ts=452246819731668997]
[2024/09/01 17:58:26.631 +08:00] [INFO] [proxy/meta_cache.go:1047] ["no shard cache for collection, try to get shard leaders from QueryCoord"] [traceID=715635b06f7b2b6091a5e736caf34cc8] [collectionName=img_feature] [collectionID=452217503082086433]
Illegal instruction

xiaofan-luan commented 2 months ago

could you try lscpu and check what kind of architecture you've been running on?

We've been working with x86(better to be latter version like Icelake or later), mac M1,M2,M3, and aws graviton and all works file.

We can try to reproduce if we get same machien as you do.

Your code of how you use milvus could also be really helpful so we know which part of the code could cause the problem

shivabohemian commented 2 months ago

The lscpu information is as follows, and it's an x86 architecture.

I simply created the ivf_sq8 index and inserted the data, and then the search had this problem. In my tests, it will be like this since v2.4.8.

I directly use the compiled files from the Docker image for creating and installing the deb package.

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Celeron(R) N5105 @ 2.00GHz
    BIOS Model name:      Intel(R) Celeron(R) N5105 @ 2.00GHz To Be Filled By O.E.M. CPU @ 2.8GHz
    BIOS CPU family:      15
    CPU family:           6
    Model:                156
    Thread(s) per core:   1
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             0
    CPU(s) scaling MHz:   28%
    CPU max MHz:          2900.0000
    CPU min MHz:          800.0000
    BogoMIPS:             3993.60
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi
                          mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc art arch_perfmon pebs bt
                          s rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
                          4 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt
                          tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 s
                          sbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase t
                          sc_adjust smep erms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xge
                          tbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_e
                          pp hwp_pkg_req umip waitpkg gfni rdpid movdiri movdir64b md_clear flush_l1d arch_capabili
                          ties
Virtualization features:
  Virtualization:         VT-x
Caches (sum of all):
  L1d:                    128 KiB (4 instances)
  L1i:                    128 KiB (4 instances)
  L2:                     1.5 MiB (1 instance)
  L3:                     4 MiB (1 instance)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Mitigation; Clear Register File
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS Not aff
                          ected; BHI SW loop, KVM SW loop
  Srbds:                  Vulnerable: No microcode
  Tsx async abort:        Not affected

xiaofan-luan commented 2 months ago

The lscpu information is as follows, and it's an x86 architecture.

I simply created the ivf_sq8 index and inserted the data, and then the search had this problem. In my tests, it will be like this since v2.4.8.

I directly use the compiled files from the Docker image for creating and installing the deb package.

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Celeron(R) N5105 @ 2.00GHz
    BIOS Model name:      Intel(R) Celeron(R) N5105 @ 2.00GHz To Be Filled By O.E.M. CPU @ 2.8GHz
    BIOS CPU family:      15
    CPU family:           6
    Model:                156
    Thread(s) per core:   1
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             0
    CPU(s) scaling MHz:   28%
    CPU max MHz:          2900.0000
    CPU min MHz:          800.0000
    BogoMIPS:             3993.60
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi
                          mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc art arch_perfmon pebs bt
                          s rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
                          4 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt
                          tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 s
                          sbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase t
                          sc_adjust smep erms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xge
                          tbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_e
                          pp hwp_pkg_req umip waitpkg gfni rdpid movdiri movdir64b md_clear flush_l1d arch_capabili
                          ties
Virtualization features:
  Virtualization:         VT-x
Caches (sum of all):
  L1d:                    128 KiB (4 instances)
  L1i:                    128 KiB (4 instances)
  L2:                     1.5 MiB (1 instance)
  L3:                     4 MiB (1 instance)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Mitigation; Clear Register File
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS Not aff
                          ected; BHI SW loop, KVM SW loop
  Srbds:                  Vulnerable: No microcode
  Tsx async abort:        Not affected

I think the machine is old and don't support avx2 and avx512, that might be the reason.

We'll fix that soon.

Meantime you'd better use some machines with avx support so milvus will be much faster

shivabohemian commented 2 months ago

Ok, thank you for your advice.

Everything is fine on v2.4.6, you can compare it to fix the problem.

PwzXxm commented 2 months ago

Looking into this @PwzXxm @chasingegg

cqy123456 commented 2 months ago

The machine miss f16c flag. Gcc compiler check this flag valid, and inline some f16c instruction in SSE function. This problem has been fix in knowhere pr#814, and you can update milvus to 2.4.11.

PwzXxm commented 2 months ago

v2.4.11 releasing soon. Thanks for opening the issue and letting us know.

shivabohemian commented 2 months ago

Thank you for the efficient fix. I will try it after the new version is released.

yanliang567 commented 1 month ago

i'd close this issue as comments above. please feel to file a new one if it reproduced on new version.

milvus-io / milvus