Closed escorciav closed 6 months ago
Hello Victor Escorcia Castillo -
Does QNN support variable input length tensors?
Current QNN release does not support variant input length tensors. We are making one End-to-End application with Whisper ASR model. It will be published shortly. If you would like to see examples for any other model (or architecture), let us know. We will try our best to make it happen.
It'd be great to have access to all the demos showcased during the Snapdragon summit of 2023 :smile:
Feel free to tag me when you release them. Happy to cross-post in my social media to raise awareness of cool on-device ML resources :blush:
@quic-rneti Hi :)
We are making one End-to-End application with Whisper ASR model. It will be published shortly.
Any ETA on this? I'm quite interested as I'm trying to convert whisper to a usable .dlc file I've tried to convert onnx, tflite & torchscript models using the snpe scripts, without success, as they all lack specific ops or dynamic layer sizes
Have a great day
We are working on this. You can expect the tutorials to be available sometime in next months. (2 weeks to 4 weeks is the lead time)
Hi @quic-rneti thanks for the response :) Could you by any chance share a whisper script which can be converted to .dlc before releasing the fully fledged app? or the dlc file alone?
Have a great day!
Hello @quic-rneti, Is the Whisper model published on Qualcomm's AI hub models repository the demo you were talking about?
From what I've seen, the provided code can only work on Android (as it's using TfLite, not QNN) and I was wondering:
Have a nice day :)
PS: @escorciav did you find another way to get Whisper working on QNN?
@ZodiacFRA I haven't tested Whisper, but please tag me (here:GH or Twitter) if you do it.
BTW, if they provide TFLite models & you plan to use QNN, just follow the tutorial :wink: . Preferably, follow the docs of the qnn that you're using. In my case, it's 2.19.0.x
=> opt/qcom/aistack/qnn/2.19.0.240124/docs/QNN/general/tutorial2.html#id1
.
Disclaimer. I haven't personally used TFLite as an ML frontend. But, some colleagues have done.
Personal opinions/judgements: Food for thought
@quic-rneti Is there any relevant example from the Qualcomm AI Hub related to the topic of the issue?
@escorciav Thanks for the quick response :)
About the tutorial: It looks like the conversion step still requires fixed input lengths (like all the snpe-[tflite|onnx|pytorch]-to-dlc
scripts), this prevents me from converting whisper as one of the self attention matrix dimensions is the number of generated tokens and cannot be fixed.
My understanding is that they published Whisper using TensorFlow Lite Hexagon delegate to workaround the SNPE fixed input length limitation, but their TensorFlow Lite Hexagon delegate only works on Android
I really don't have any preferences as to which software stack I use, my goal is to use Whisper on a SM8550 (SD gen 2) running Linux. From what I've understood QNN is the only way to do that?
I see. Sorry that I can't help. Did you post your question in the Slack forum associated with Qualcomm AI Hub? perhaps consider giving a shot (deploy) the provided whisper (encoder | decoder) and make a post about it :shrug:
Re-opening issue to address latest comments. Do consider raising a new issue.
Dear team,
After taking a look at the NLP solutions: QA, and Sentiment analysis, none of them support varied input length.
Does QNN support variable input length tensors? New versions do digest graphs with opset >= 9, and opset >= 11 does support variable len axis. Thus, it's opt to your tooling :)
Moreover, the inference of recent NLPmodels, e.g., autoregressive Tansformer decoder, often rely on caching key & values. Could you please provide any guidance to better support such inference pipeline?
Thanks in advance & kudosto all the team for providing examples of AI models using Qualcomm hardware :)