milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.09k stars 2.88k forks source link

[Bug]: milvus exception #30558

Closed C-rawler closed 6 months ago

C-rawler commented 8 months ago

Is there an existing issue for this?

Environment

- Milvus version:milvusdb/milvus:v2.3.1
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

This error occurred shortly after I started the milvus database.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[[32mmilvus-standalone |^[[0m [2024/02/06 06:34:17.413 +00:00] [DEBUG] [client/client.go:96] ["RootCoordClient GetSessions success"] [address=172.20.0.4:53100] [serverID=17]^[[32mmilvus-standalone |^[[0m [2024/02/06 06:34:17.413 +00:00] [ERROR] [grpcclient/client.go:405] ["retry func failed"] ["retry time"=0] [error="rpc error: code = Unknown desc = expectedNodeID=16, actualNodeID=17: node not match"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:405\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:483\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499\ngithub.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:120\ngithub.com/milvus-io/milvus/internal/distributed/rootcoord/client.(Client).DescribeCollection\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:196\ngithub.com/milvus-io/milvus/internal/datacoord.(CoordinatorBroker).HasCollection\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/coordinator_broker.go:135\ngithub.com/milvus-io/milvus/internal/datacoord.(ServerHandler).HasCollection.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/handler.go:372\ngithub.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:40\ngithub.com/milvus-io/milvus/internal/datacoord.(ServerHandler).HasCollection\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/handler.go:371\ngithub.com/milvus-io/milvus/internal/datacoord.(ServerHandler).CheckShouldDropChannel\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/handler.go:411\ngithub.com/milvus-io/milvus/internal/datacoord.(ChannelManager).unwatchDroppedChannels\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/channel_manager.go:254\ngithub.com/milvus-io/milvus/internal/datacoord.(ChannelManager).Startup\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/channel_manager.go:164\ngithub.com/milvus-io/milvus/internal/datacoord.(Cluster).Startup\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/cluster.go:57\ngithub.com/milvus-io/milvus/internal/datacoord.(Server).initServiceDiscovery\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:490\ngithub.com/milvus-io/milvus/internal/datacoord.(Server).initDataCoord\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:347\ngithub.com/milvus-io/milvus/internal/datacoord.(Server).Init\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:316\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(Server).init\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:108\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:229\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/data_coord.go:49\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:112"]

Anything else?

No response

yanliang567 commented 8 months ago

@C-rawler we need the full milvus logs for investigation, could ou please refer this doc to export the whole Milvus logs ? /assign @C-rawler /unassign

C-rawler commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

yanliang567 commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

please try some cloud drive for sharing?

C-rawler commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

please try some cloud drive for sharing?

Can you download it from Baidu Cloud?

C-rawler commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

please try some cloud drive for sharing?

https://drive.google.com/file/d/1ak0_DBjY9LkNeVZ2nHhU4msjzIvfHkZz/view?usp=drive_link

yanliang567 commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

please try some cloud drive for sharing?

https://drive.google.com/file/d/1ak0_DBjY9LkNeVZ2nHhU4msjzIvfHkZz/view?usp=drive_link

please share the key for downloading the logs

C-rawler commented 8 months ago

@yanliang567 Please tell me how to upload the complete milvus logs. Now there will be frequent exits.

please try some cloud drive for sharing?

https://drive.google.com/file/d/1ak0_DBjY9LkNeVZ2nHhU4msjzIvfHkZz/view?usp=drive_link

please share the key for downloading the logs

Sorry, please try this link again. https://drive.google.com/file/d/1ak0_DBjY9LkNeVZ2nHhU4msjzIvfHkZz/view?usp=sharing

yanliang567 commented 8 months ago

@C-rawler i can you are starting milvus from some existing collections and partitions. Moreover, these collections and partitions were loaded, so milvus is trying to load them after started. It suddenly fails to load the segments after loading about 888MB data... For you solution, i think we could try to add more memory to the milvus pod/container. For milvus, we have improved the memory predict in milvus 2.3.8, please retry on it if possible.

[collectionID=444925537510994479] [maxSegmentSize=416.8495330810547] [concurrency=1] [committedMemSize=703.9019622802734] [memUsage=888.7808685302734] [committedDiskSize=0] [diskUsage=0] [predictMemUsage=1305.6304016113281] [predictDiskUsage=0] [mmapEnabled=false]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:400] ["request resource for loading segments (unit in MiB)"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [segmentIDs="[446469289939867624]"] [workerNum=190] [committedWorkerNum=553] [memory=416.8495330810547] [committedMemory=1120.7514953613281] [disk=0] [committedDisk=0]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:186] ["start loading..."] [traceID=676dea72830c04acd4fcdc4a91f25e5a] [collectionID=444925537510994490] [segmentType=Sealed] [segmentNum=1] [afterFilter=1]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment.go:182] ["create segment"] [collectionID=444925537510994479] [partitionID=444925537510994480] [segmentID=446469289939867624] [segmentType=Sealed]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:258] ["start to load segments in parallel"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [segmentType=Sealed] [segmentNum=1] [concurrencyLevel=1]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:536] ["start loading segment files"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [partitionID=444925537510994480] [shard=by-dev-rootcoord-dml_1_444925537510994479v0] [segmentID=446469289939867624] [rowNum=113963] [segmentType=Sealed]
milvus-standalone | [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:578] ["load fields..."] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [partitionID=444925537510994480] [shard=by-dev-rootcoord-dml_1_444925537510994479v0] [segmentID=446469289939867624] [indexedFields="[111]"]
milvus-standalone | [2024/02/06 14:00:17.154 +00:00] [INFO] [segments/segment_loader.go:186] ["start loading..."] [traceID=4be7623495bd8f0653b6ce8e766c3521] [collectionID=444925537510994485] [segmentType=Sealed] [segmentNum=1] [afterFilter=1]
milvus-standalone | [2024/02/06 14:00:17.154 +00:00] [WARN] [delegator/delegator_data.go:392] ["worker failed to load segments"] [traceID=676dea72830c04acd4fcdc4a91f25e5a] 
C-rawler commented 8 months ago

@C-rawler i can you are starting milvus from some existing collections and partitions. Moreover, these collections and partitions were loaded, so milvus is trying to load them after started. It suddenly fails to load the segments after loading about 888MB data... For you solution, i think we could try to add more memory to the milvus pod/container. For milvus, we have improved the memory predict in milvus 2.3.8, please retry on it if possible.

[collectionID=444925537510994479] [maxSegmentSize=416.8495330810547] [concurrency=1] [committedMemSize=703.9019622802734] [memUsage=888.7808685302734] [committedDiskSize=0] [diskUsage=0] [predictMemUsage=1305.6304016113281] [predictDiskUsage=0] [mmapEnabled=false]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:400] ["request resource for loading segments (unit in MiB)"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [segmentIDs="[446469289939867624]"] [workerNum=190] [committedWorkerNum=553] [memory=416.8495330810547] [committedMemory=1120.7514953613281] [disk=0] [committedDisk=0]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:186] ["start loading..."] [traceID=676dea72830c04acd4fcdc4a91f25e5a] [collectionID=444925537510994490] [segmentType=Sealed] [segmentNum=1] [afterFilter=1]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment.go:182] ["create segment"] [collectionID=444925537510994479] [partitionID=444925537510994480] [segmentID=446469289939867624] [segmentType=Sealed]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:258] ["start to load segments in parallel"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [segmentType=Sealed] [segmentNum=1] [concurrencyLevel=1]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:536] ["start loading segment files"] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [partitionID=444925537510994480] [shard=by-dev-rootcoord-dml_1_444925537510994479v0] [segmentID=446469289939867624] [rowNum=113963] [segmentType=Sealed]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.152 +00:00] [INFO] [segments/segment_loader.go:578] ["load fields..."] [traceID=6064a8fde4ae287d4e6b60a3cb78fa01] [collectionID=444925537510994479] [partitionID=444925537510994480] [shard=by-dev-rootcoord-dml_1_444925537510994479v0] [segmentID=446469289939867624] [indexedFields="[111]"]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.154 +00:00] [INFO] [segments/segment_loader.go:186] ["start loading..."] [traceID=4be7623495bd8f0653b6ce8e766c3521] [collectionID=444925537510994485] [segmentType=Sealed] [segmentNum=1] [afterFilter=1]
�[32mmilvus-standalone |�[0m [2024/02/06 14:00:17.154 +00:00] [WARN] [delegator/delegator_data.go:392] ["worker failed to load segments"] [traceID=676dea72830c04acd4fcdc4a91f25e5a] 

@yanliang567 Thank you for your help, so I can understand that it is a memory problem. How can I add more memory to the milvus pod/container, or can I update to milvus 2.3.8 to ensure that the previous data is loaded normally?

yanliang567 commented 8 months ago

Milvus 2.3.8 fixed issues about memory prediction, so it is not helpful if you have more data than the memory could hold. suppose you are running docker compose on mac, you need to add more cpu or memory on docker manager/

C-rawler commented 8 months ago

Milvus 2.3.8 fixed issues about memory prediction, so it is not helpful if you have more data than the memory could hold. suppose you are running docker compose on mac, you need to add more cpu or memory on docker manager/

I am running docker-compose on Linux. How do I need to modify the amount of memory? I have not found the relevant configuration.

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 7 months ago

then you just need more memory for the linux

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.