Open bil-ash opened 3 weeks ago
Hi, Evan from Useful Sensors/Moonshine here. Just letting you know this is on our radar. We're working on Transformers support right now, and we (internally) have our current ONNX models running in the browser with onnxruntime-web
. Shouldn't be too difficult to get support added to transformers.js from there.
Hi, Evan from Useful Sensors/Moonshine here. Just letting you know this is on our radar. We're working on Transformers support right now, and we (internally) have our current ONNX models running in the browser with
onnxruntime-web
. Shouldn't be too difficult to get support added to transformers.js from there.
@evmaki Not related to this issue, but I am also eagerly waiting for the ability to finetune for supporting a new language.
Hi, Evan from Useful Sensors/Moonshine here. Just letting you know this is on our radar. We're working on https://github.com/huggingface/transformers/issues/34474 right now, and we (internally) have our current ONNX models running in the browser with onnxruntime-web. Shouldn't be too difficult to get support added to transformers.js from there.
@evmaki Great to hear! I have been following the ONNX support and it looks like a great start! One issue is that you currently export two versions of the decoder (w/ and w/o PKVs), leading to weight duplication (more of a problem when running in the browser since we load the decoder twice).
We were able to solve this in Optimum by adding an If node to the graph and then choosing which path to take based on whether the past key values are provided. See here for an example. And here is the code used to merge the two decoders.
I was experimenting with your codebase to pass zero-sized tensors as input, but I get gibberish output.
Either way, once we have transformers
support, I expect this to be much easier to convert (since the input/output signatures should be similar to whisper).
Model description
Please add support for moonshine ASR models. The recent github commit adds support for onnx(python), so I guess porting to js won't take much effort. However, there is no mention about transformers usage.
This model is quite good for in-browser usage scenario since it is quite small and claims to use RAM proportional to length of audio.
Prerequisites
Additional information
No response
Your contribution
None