Open ken2190 opened 7 months ago
Could you try latest main branch?
I get the exact same error when trying to replicate the following tutorial with version 0.5.0: https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/
I tried with the latest main branch, but still got the error [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported
Has the issue been resolved?
@tobernat I see on https://github.com/triton-inference-server/tensorrtllm_backend/issues/363 that you were able to resolve this. Is there anything more that needs to be done? @ken2190 can you please refer to the fix posted in issue 363 linked above?
In our case (see #363) it was solved by setting vGPU plugin parameters in VMware: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere see also here: https://kb.vmware.com/s/article/2142307
In our case (see #363) it was solved by setting vGPU plugin parameters in VMware: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere see also here: https://kb.vmware.com/s/article/2142307 @tobernat What parameters where needed to set to get this issue resolved? I have the same issue with a L40s card.
Thanks for the reply!
System Info
Who can help?
No response
Information
Reproduction
Model built without problem
Expected behavior
actual behavior
additional notes
@byshiue @schetlur-nv @juney-nvidia Could you take a look