Open 1271653627 opened 6 months ago
@yanliang567 May I ask: If I want to restart the milvus service to fix the issue with the milvus cluster deployed using Docker Swarm, can I first stop the milvus related services, that is, coordinate nodes, work nodes, and proxy nodes, and restart them. Just keep the minio, etcd, and plusar stationary?
@1271653627 you can restart in that way, but you shall know that Docker Swarm is not a tested depoyment mean in the community.
/assign @congqixia looks like a mq issue, please help to confirm /unassign
@congqixia @yanliang567 Below is the log for the coord node that I've supplemented. coordnode.zip I noticed this issue #25267, and I've encountered a similar situation before where the number of entities on the attu is incorrect after inserting data. Referring to their method: changing the number of replicas of the datanode to 1, and increasing the rootCoord.dmlChannelNum parameter, can this solve the current problem? Also, I deployed a Milvus cluster with the same configuration in the test environment, and everything worked fine. However, in the production environment, I couldn't insert data. The test environment is running on Red Hat Enterprise Linux Server 7.4 Maipo (64-bit), while the production environment is on UOS 20 Fuyu (64-bit). I wonder if it's related to the operating system. Looking forward to your response. Thanks for your support.
@1271653627 after some inspection from the log. It looks like the datanode failed to query topic from pulsar broker for a long period.
datanode session id went 100+, so it repeatedly tried to subscribe for serving insert data.
Did you pulsar cluster went abnormal during the problem occurred?
And could you please provided the mq
section in you configuration file? the port 8080 seems strange here according to @LoveEachDay
I set pulsar webport to 8080,because of below picture. this is my milvus config file. @congqixia milvus-config.txt
@LoveEachDay Please help check the comments above, thank you.
Is there an existing issue for this?
Environment
Current Behavior
The cluster is unable to insert data, and every time data insertion is attempted, the data node restarts. When checking the logs, the following error is reported. However, the cluster can create collections and load them normally. milvus_data1.1.ddkurhxaadx8@gp22aitppap92xj | [2024/05/11 17:16:34.629 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=8] [error="server error: ServiceNotReady: Namespace bundle for topic (persistent://public/default/cpic-milvus-rootcoord-dml_3) not served by this instance:broker:8080. Please redo the lookup. Request is denied: namespace=public/default"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.(MqTtMsgStream).AsConsumer\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mq_msgstream.go:586\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.NewDispatcher\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/dispatcher.go:100\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(dispatcherManager).Add\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:93\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(client).Register\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/client.go:77\ngithub.com/milvus-io/milvus/internal/datanode.newDmInputNode\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_dmstream_input_node.go:49\ngithub.com/milvus-io/milvus/internal/datanode.getServiceWithChannel\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:361\ngithub.com/milvus-io/milvus/internal/datanode.newServiceWithEtcdTickler\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:431\ngithub.com/milvus-io/milvus/internal/datanode.(flowgraphManager).addAndStartWithEtcdTickler\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_manager.go:131\ngithub.com/milvus-io/milvus/internal/datanode.(DataNode).handlePutEvent\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/event_manager.go:179\ngithub.com/milvus-io/milvus/internal/datanode.(channelEventManager).Run.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/event_manager.go:268"]
Expected Behavior
insert data normally
Steps To Reproduce
Milvus Log
This data node log milvus_data.log
Anything else?
No response