Closed zhuwenxing closed 1 year ago
It failed at the step in the red block
/assign @soothing-rain /unassign
might be related with #20534
/assign @MrPresent-Han
@soothing-rain: GitHub didn't allow me to assign the following users: MrPresent-Han.
Note that only milvus-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
What does this mean?
What does this mean?
clue, debugging...
@zhuwenxing
Please try again , it may have been fixed in master branch.
@zhuwenxing
Please try again , it may have been fixed in master branch.
Please link the fix PR!
@zhuwenxing
Please try again , it may have been fixed in master branch.
It is not a stable reproduced issue, so it is hard to verifying
This issue may be due to time gap between query operation towards query node and load segments operation on query node. See images below, in which you can see that query request arrive at the node on 09:43:07.202, while the replica on this node completed loading on 09:43:07.733. This lead to the query request had an empty segments list, which is shown in the picture.
The reason for why the previous search request got a correct complete result is that when search request arrived at query node-21, the time point is 09:43:07.190, while before that the segments 2546 and 4728 inside replica *8364 have been loaded completely on 09:43:05.701 and 09:43:06.008. And according to roundRobinPolicy running on the proxy, in the later query round, request will be routed to replica_8363, which encountered a relatively-delayed loading case described above.
@MrPresent-Han if loading not completed, how search requests return success instead of return an error?
@MrPresent-Han if loading not completed, how search requests return success instead of return an error?
This issue is related to replica, query, balance, load/reduce, relatively complicated. I will upload a document illustrating this problem in more detail lately
This is an error incurred by improper removal of segments triggered by leader observer on querycoord, in which the next target of collection is not considered. This defect has been settled down by commit eaa5cfdcb5e9461d779d21b13d20e21389e26d2 on master branch. I add some log to trace load and remove action on querycoord for better debug process
Not reproduced
Is there an existing issue for this?
Environment
Current Behavior
The code of query
The query results length should be 4, but the length of the actual result is only 2
Expected Behavior
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test/detail/deploy_test/623/pipeline/307 log: artifacts-pulsar-cluster-upgrade-623-server-logs.tar.gz artifacts-pulsar-cluster-upgrade-623-pytest-logs.tar.gz
Anything else?
No response