Open binbinlv opened 11 months ago
/assign @liliu-z @Presburger /unassign
This is a GPU image, so we can create GPU index. And the case only involved 10K 128dim data, which didn't trigger index building at all. So this is as expected.
data size is so small, cannot triage GPU build stage.
will try big data size.
This is a GPU image, so we can create GPU index. And the case only involved 10K 128dim data, which didn't trigger index building at all. So this is as expected. what is the size/rule to tigger the GPU index. @liliu-z
@yanliang567 while slow data can also trigger index build, but should flush\create\load,then the data will sealed.
when inserting 5M data, then create GPU_IVF_FLAT on cpu machine, milvus crashed showing the following error in log:
[2023/10/13 02:51:59.738 +00:00] [DEBUG] [config/etcd_source.go:141] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[gpu-cpu-machine-qfljr-etcd:2379]"]
F20231013 02:51:59.877173 94 raft_utils.cc:24] [KNOWHERE][gpu_device_manager][milvus] CUDA error encountered at: file=/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/common/raft/raft_utils.cc line=22: call='cudaGetDeviceCount(&device_counts)', Reason=cudaErrorInsufficientDriver:CUDA driver version is insufficient for CUDA runtime version
Could we stop GPU index on CPU machine more user friendly? like report error in advance instead of milvus crash.
This is a GPU image, so we can create GPU index. And the case only involved 10K 128dim data, which didn't trigger index building at all. So this is as expected. what is the size/rule to tigger the GPU index. @liliu-z
No size/rule, just because data is still in a growing segment.
Could we stop GPU index on CPU machine more user friendly? like report error in advance instead of milvus crash.
Make sense to catch an exception and throw it out to let indexCoord retry. @Presburger can you help take a look?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
keep open, remove stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Can we load a collection index built using GPU_IVF_FLAT index to a CPU node which has more DRAM?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
GPU index could be created successfully on CPU machine, and could be searched too.
Expected Behavior
GPU index could not be created successfully on CPU machine, and report error
Steps To Reproduce
Milvus Log
https://grafana-4am.zilliz.cc/explore?orgId=1&left=%7B%22datasource%22:%22Loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22gpu-cpu-machine-wssbe.*%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D
Anything else?
No response