Open wangting0128 opened 1 month ago
the cpu is full and request takes hours. I think time out is just fine. for any system beyond it's capacity, you will timeout
as long as service didn't crash i thought it's fine.
as long as service didn't crash i thought it's fine.
The normal average DQL time is < 500ms. During the DQL request timeout(60s) period, the CPU is not fully utilized. I think this should be a problem that needs to be checked. 🤔️
Only 2M of data was inserted, but the Queryable Entity Num showed 44.8M, and the memory increased from 5G to 57+G.
Only 2M of data was inserted, but the Queryable Entity Num showed 44.8M, and the memory increased from 5G to 57+G.
How did you define only 2M is inserted? is seems that you have delete and insert in the test. I think most of the 44.8M data has been deleted but not compacted in time. is this what you are trying to test? the compaction can catch up with deletes
Only 2M of data was inserted, but the Queryable Entity Num showed 44.8M, and the memory increased from 5G to 57+G.
How did you define only 2M is inserted? is seems that you have delete and insert in the test. I think most of the 44.8M data has been deleted but not compacted in time. is this what you are trying to test? the compaction can catch up with deletes
Here's what I see: search timeout(but cannot cancel)-> pining segments -> memory and segment count raising
This is how search works: if thery are submiited into c++, when golang timeout and returned for like 1min, the c++ part will continuous to run.
CPU is down when all search/query in c++ finished. In the mean time ,some of the search/query finished during this short time. And all the other time, search/query just failed of timeout.
The behavior is expected, nothing abnormal, except perhaps we need smaller search tasks that took less than 1hrs.
Also, we'might need to be able to cancel c++ tasks from golang side to aviod such long-time pin. The memory status of querynode looks fragile, which means long search tasks could easily breaks querynode's memory and causing limit writing or even OOM.
/unassign /assign @wangting0128
@wangting0128 In your tests, from the metrics, it's more likely there're 99% of VERY LONG DQL with 1% of quick DQL.
The behavior is expected, nothing abnormal, except perhaps we need smaller search tasks that took less than 1hrs.
Also, we'might need to be able to cancel c++ tasks from golang side to aviod such long-time pin. The memory status of querynode looks fragile, which means long search tasks could easily breaks querynode's memory and causing limit writing or even OOM.
/unassign /assign @wangting0128
2M of data, only 10 pieces of data are DQL each time, but it takes 1 hour, is this reasonable?
doesn't seen reasonable, I'll look into this.
Now for expr like two column compare like A < B, if A and B has index, need to reverse look up raw data from index one by one, actually it is slow.
verified scalar fields compare argo task: fouramf-9j5lj-query-expr-3
scalar fields not build index
[2024-11-05 07:48:21,400 - INFO - fouram]: [Base] expr of query: "int16_1 == int8_1", kwargs:{'limit': 10000} (base.py:548)
[2024-11-05 07:48:21,447 - INFO - fouram]: [Time] Collection.query run in 0.0464s (api_request.py:49)
scalar build INVERTED index
[2024-11-05 07:58:24,558 - INFO - fouram]: [Base] expr of query: "int16_inverted == int8_inverted", kwargs:{'limit': 10000} (base.py:548)
[2024-11-05 07:58:24,629 - INFO - fouram]: [Time] Collection.query run in 0.0703s (api_request.py:49)
scalar build BITMAP index
[2024-11-05 07:48:24,825 - INFO - fouram]: [Base] expr of query: "int16_bitmap == int8_bitmap", kwargs:{'limit': 10000} (base.py:548)
[2024-11-05 07:48:29,795 - INFO - fouram]: [Time] Collection.query run in 4.9695s (api_request.py:49)
Is there an existing issue for this?
Environment
Current Behavior
argo task:fouramf-bitmap-scenes-fdgrx test case name:test_bitmap_locust_dql_dml_standalone
server:
client test result:
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
No response