Is support planned for NEFF and the respective runtime? Currently writing a Triton DLR backend, and having the option for a unified backend entrypoint to the neuron runtime if INF1 instances are specified would be very nice.
I know Neuron uses a TVM frontend, so I understand it is possibly best to just make a choice -- either use the raw TVM runtime exposed by DLR or compile your model via Neo, targeted at INF1 using Neuron. However, Neuron's usage of a TVM frontend is somewhat a blackbox, and doesn't allow directly passing TVM .so, etc. directly to neuron-cc. This limits use cases, such as classical ML models compiled via HummingbirdML to TVM.
Is support planned for NEFF and the respective runtime? Currently writing a Triton DLR backend, and having the option for a unified backend entrypoint to the neuron runtime if INF1 instances are specified would be very nice.
I know Neuron uses a TVM frontend, so I understand it is possibly best to just make a choice -- either use the raw TVM runtime exposed by DLR or compile your model via Neo, targeted at INF1 using Neuron. However, Neuron's usage of a TVM frontend is somewhat a blackbox, and doesn't allow directly passing TVM .so, etc. directly to
neuron-cc
. This limits use cases, such as classical ML models compiled via HummingbirdML to TVM.