speed up phi-3 inference?

microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.

MIT License

1.16k stars 331 forks source link

speed up phi-3 inference? #428

Open CHNtentes opened 4 months ago

CHNtentes commented 4 months ago

Hi, I tried your phi-3 example on Android. I wonder if it can run on GPU/Qualcomm HTP to further increase speed? Currently I suppose it only uses CPU.

salykova commented 4 months ago

I am interested in this too

CHNtentes commented 4 months ago

I suppose for now they probably cannnot do it, like executorch only small models can run on gpu/htp.

varunchariArm commented 2 months ago

I am interested too, also may I know what Execution Provider is being used in the provided 'libonnxruntime.so'? Thans