Closed xionghuaidong closed 4 months ago
/assign @Presburger
@Presburger
can you offer command nvidia-smi outputs? @xionghuaidong
@Presburger Here is the nvidia-smi outputs.
$nvidia-smi
Wed Apr 10 15:31:14 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:04:00.0 Off | 0 |
| N/A 29C P0 27W / 250W | 124MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:84:00.0 Off | 0 |
| N/A 25C P0 24W / 250W | 128MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
@xionghuaidong, Hi Is it possible to upgrade the nvidia driver to 535 or newer?
@xionghuaidong, Hi Is it possible to upgrade the nvidia driver to 535 or newer?
@Presburger It's a bit complicated since the development machine is shared among a few developers.
I tested CPU HNSW and other GPU indexes GPU_IVF_PQ, GPU_IVF_FLAT, GPU_BRUTE_FORCE. They all work. For GPU_CAGRA, index building seems OK according to the log, and searching is not working.
@xionghuaidong, Hi Is it possible to upgrade the nvidia driver to 535 or newer?
@Presburger It's a bit complicated since the development machine is shared among a few developers.
I tested CPU HNSW and other GPU indexes GPU_IVF_PQ, GPU_IVF_FLAT, GPU_BRUTE_FORCE. They all work. For GPU_CAGRA, index building seems OK according to the log, and searching is not working.
Hi, can you offer me the search params, when you try GPU_CAGRA?
@xionghuaidong, Hi Is it possible to upgrade the nvidia driver to 535 or newer?
@Presburger It's a bit complicated since the development machine is shared among a few developers. I tested CPU HNSW and other GPU indexes GPU_IVF_PQ, GPU_IVF_FLAT, GPU_BRUTE_FORCE. They all work. For GPU_CAGRA, index building seems OK according to the log, and searching is not working.
Hi, can you offer me the search params, when you try GPU_CAGRA?
I'm using default search params of GPU_CAGRA.
query_vectors = [
[0.041732933] * self._vector_dimensions,
]
result = self._milvus_client.search(
collection_name=self._milvus_collection,
data=query_vectors,
limit=self._test_vector_search_limit, # set to 10
output_fields=[self._entity_id_field_name],
)
See the _test_vector_search
method in build_milvus_index.py.
@xionghuaidong hi, try change this line
"build_algo": "IVF_PQ",
to
"build_algo": "NN_DESCENT",
some GPU use IVF_PQ build graph maybe very very slow.
@xionghuaidong hi, try change this line
"build_algo": "IVF_PQ",
to
"build_algo": "NN_DESCENT",
some GPU use IVF_PQ build graph maybe very very slow.
@Presburger Hi, thanks for your response.
I executed my script build_milvus_index.py with "build_algo": "NN_DESCENT",
and the Milvus server coredumped.
Here is the log. milvus.log
@Presburger Hi,is there any progress?
@Presburger Hi, I used docker-compose up
to launch the official milvusdb/milvus:v2.4.1-gpu
docker image and encountered the following error.
milvus-standalone | container_linux.go:251: starting container process caused "process_linux.go:346: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/bin/nvidia-container-cli --load-kmods configure --device=all --compute --utility --require=cuda>=11.8 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 --pid=53085 /home/docker/overlay2/c01840fc568052fef2908a29b27d7119e275e1c1c500fd35aeca54705d8961f7/merged]\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = titanrtx\\n\""
Error response from daemon: invalid header field value "oci runtime error: container_linux.go:251: starting container process caused \"process_linux.go:346: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/bin/nvidia-container-cli --load-kmods configure --device=all --compute --utility --require=cuda>=11.8 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 --pid=53085 /home/docker/overlay2/c01840fc568052fef2908a29b27d7119e275e1c1c500fd35aeca54705d8961f7/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = titanrtx\\\\n\\\"\"\n"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
The
build_milvus_index.py
script failed with exception: RuntimeError: test vector search 10 vectors, returned 0 vectorsExpected Behavior
The
build_milvus_index.py
succeeded with message: test vector search 10 vectors, returned 10 vectorsSteps To Reproduce
Milvus Log
milvus.log
Anything else?
No response