Open zhuwenxing opened 2 weeks ago
/assign @weiliu1031
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/4/pipeline
This issue currently has a relatively high reproduction probability. @weiliu1031
/assign @bigsheeper
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/7/pipeline
so set is as critical
RESTful load operations or quick setup create collection
operations don’t wait for loading to finish. We need to add a get load state
check and wait for loading to complete before importing.
Please help modify all related cases.@zhuwenxing
/assign @zhuwenxing
[pytest : test] c = Collection(name)
[pytest : test] > c.load(_refresh=True)
the failed step is c.load(_refresh=True)
, so it should not be related to restful api?
Do you mean that the call of c.load(refresh=True) needs to wait until the load before has completed?
Do you mean that the call of c.load(refresh=True) needs to wait until the load before has completed?
Yes. So the test cases need to be updated to wait for loading to complete. You can add a timeout to the wait process.
BTW, there's also an issue in the server. The import process continues for several dozen seconds, and it's problematic that the collection hasn't completed loading in that time. This is related to the issue https://github.com/milvus-io/milvus/issues/37395.
After test cases done, many collections weren't dropped, occupying the pool of the target observer scheduler. This prevented newly loaded collections from updating the current target, which in turn caused load slowly and time out.
/assign @zhuwenxing could you verify this issue
@xiaofan-luan So can this PR https://github.com/milvus-io/milvus/pull/37433 solve this issue? Do we need to wait for the https://github.com/milvus-io/milvus/pull/37454 to merge?
@xiaofan-luan So can this PR #37433 solve this issue? Do we need to wait for the #37454 to merge?
I think we can validate this after https://github.com/milvus-io/milvus/pull/37454 merged.
Actually multiple PRs (https://github.com/milvus-io/milvus/pull/37433, https://github.com/milvus-io/milvus/pull/37454, https://github.com/milvus-io/milvus/pull/37513) can help alleviate this issue. Additionally, the improvements to the CI environment and process by @yellow-shine are also beneficial.
@bigsheeper What is the expected load completion time for an empty collection after fixing, and is it okay to set the timeout to 5s? Currently, in the CI case, it is set to 60s to ensure it passes without issue. If I want to verify that your PR truly speeds up the load, I plan to set the timeout to 5s.
Is there an existing issue for this?
Environment
Current Behavior
After Import, execute Refresh Load, Refresh Load failed
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/1/pipeline
Anything else?
No response