zilliztech / knowhere

Knowhere is an open-source vector search engine, integrating FAISS, HNSW, etc.
Apache License 2.0
169 stars 70 forks source link

Supporting driver 470.182.03 by rebuilding knowhere ? #658

Closed HowardQin closed 1 week ago

HowardQin commented 2 months ago

I have a server with RTX2080 and nvidia driver version 470.182.03, I tried milvus 2.4.4-gpu docker image, but exit with error:

I20240619 02:08:58.745658 17 thread_pool.h:152] [KNOWHERE][InitGlobalBuildThreadPool][milvus] Init global build thread pool with size 4 W20240619 02:08:58.773262 17 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: raft::cuda_error terminate called after throwing an instance of 'raft::cuda_error' what(): CUDA error encountered at: file=/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/common/raft/integration/raft_initialization.cc line=53: call='cudaGetDeviceCount(&result)', Reason=cudaErrorCompatNotSupportedOnDevice:forward compatibility was attempted on non supported HW Obtained 7 stack frames

0 in /milvus/lib/libknowhere.so(_ZN4raft9exception18collect_call_stackEv+0x99) [0x7f8d3bcf8279]

1 in /milvus/lib/libknowhere.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xd5) [0x7f8d3bcfb185]

2 in /milvus/lib/libknowhere.so(+0x6b30bc) [0x7f8d3bb4e0bc]

3 in /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f8d61172ee8]

4 in /milvus/lib/libknowhere.so(_ZN13raft_knowhere15initialize_raftERKNS_18raft_configurationE+0x6d) [0x7f8d3c098abd]

5 in /milvus/lib/libknowhere.so(_ZN8knowhere14KnowhereConfig14SetRaftMemPoolEv+0x49) [0x7f8d3bcd3229]

6 in milvus(runtime.asmcgocall.abi0+0x64) [0x1c5fba4]

The limitation here is upgrading cuda driver is not allowed, only I can do is to rebuild milvus with cuda-toolkit-11.7 on the machine, I checked the compatibility table it seems cuda-toolkit-11.7 and driver 470 are compatible, but I'm the idea will be work, any suggestions thank you!

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Presburger commented 1 month ago

@HowardQin hi, Because NVIDIA does not allow forward compatibility on consumer-grade GPU, you can only resolve this issue on consumer-grade GPU by upgrading the NVIDIA drivers.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.