Open mindbeast opened 11 months ago
+1 to this, would be nice to get comparable performance to TensorRT without having to export models to ONNX etc. first!
+1
@mindbeast @bionictoucan @hietalajulius
Hi, thanks for the comment.
Yes, that makes sense in general.
Right now, for ExecuTorch, we are integrating Vulkan into ExecuTorch. The reason is that it is a suitable solution for mobile GPUs. Enabling mobile use-cases is our primary goal at the moment.
We will revisit Cuda, but perhaps, in the second half in the year. Curious, what are your current product needs?
Apologies for opening a similar feature request in #5263.
Curious, what are your current product needs?
@mergennachin We want to deploy LLMs in cars, but Python-based inference frameworks like vLLM and SGLang are not suitable for edge devices.
We will revisit Cuda, but perhaps, in the second half in the year.
Nearly five months have passed, is there any progress on this?
Thank you for following up @DzAvril.
We want to deploy LLMs in cars, but Python-based inference frameworks like vLLM and SGLang are not suitable for edge devices.
I guess this is using a platform similar to Jetson?
Nearly five months have passed, is there any progress on this?
No update yet on CUDA backend for ET at the moment. We can get back to you here once we plan something.
I guess this is using a platform similar to Jetson?
@digantdesai Yes, Jetson Orin for now, and possibly Thor in the future.
Looking forward to your update.
Does it make sense for executorch to have a mobile cuda backend? There are many edge devices in the Jetson lineup from nvidia that have a cuda gpu, but can benefit from not wanting to link an enormous libtorch dependence.