Closed meirinberg closed 4 days ago
We provide Swift APIs for kws, please have a look at https://github.com/k2-fsa/sherpa-onnx/blob/master/swift-api-examples/keyword-spotting-from-file.swift
We assume you can figure out how the C++ code works if you want to reinvent the wheel.
Hello, I'm using sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2 with https://github.com/microsoft/onnxruntime-swift-package-manager. I successfully encode the data, as I get non-zero buffers from encoderOutputValue.tensorData(). I then pass that data as the input to my decoder ORTValue(). Unfortunately, I always see "decoder output: {length = 0, bytes = 0x}" as my decoder output. I'm also a bit confused as to how the tokens, keywords, etc. are loaded when using the swift package manager version. Any help is appreciated, thank you. Update a few hours later: I realize that the decoder was returning zero because the buffer shape was not created properly. The shapes should be: Input tensor name: y, Shape: [0, 2]; Output tensor name: decoder_out, Shape: [0, 320]. However, when I was setting the NSNumber at index zero to zero, the buffer was created to be zero length making the decoder always 0. I saw this when I had verbose output on: "2024-09-09 15:51:20.992418 [V:onnxruntime:, bfc_arena.cc:317 AllocateRawInternal] tried to allocate 0 bytes". I realized that 0 indicates dynamic. Unfortunately, setting index 0 of the shape to "encoderOutputValue.tensorTypeAndShapeInfo().shape[0].intValue" makes the following errors: "2024-09-09 16:23:56.360281 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/decoder/embedding/Gather' Status Message: indices element out of data bounds, idx=4374855830894280704 must be within the inclusive range [-500,499]" and "For ort_value with index: 2, block in memory pattern size is: 20480 but the actual size is: 1280, fall back to default allocation behavior". I also have other questions -- How can I tell which keyword was spoken? When say the keywords the values don't seem to deviate from the regular noise? An example project for KWS using the swift package manager would be super helpful.