hotg-ai / rune

Rune provides containers to encapsulate and deploy edgeML pipelines and applications
Apache License 2.0
134 stars 15 forks source link

The microspeech model expects 1.5s but our input files are only 1s long #90

Closed Michael-F-Bryan closed 3 years ago

Michael-F-Bryan commented 3 years ago

The microspeech Runefile expects to receive 24,000 samples of audio at 16kHz, but yes_01d22d03_nohash_0.wav and friends are only 16,000 samples long.

This isn't a problem when you compile the Rune in release mode (rune build Runefile), but when compiled in debug mode (rune build Runefile --debug) we trigger this assertion.

[2021-03-25T18:30:19.671Z INFO ] panicked at 'assertion failed: `(left == right)`
    left: `32000`,
   right: `48000`', /home/michael/Documents/hotg-ai/rune/runic-types/src/wasm32/mod.rs:92:9
Error: Call failed

Caused by:
    0: Unable to call the _call function
    1: RuntimeError: unreachable
           at core::core_arch::wasm32::unreachable::hb8a7ba5af00cd3dd (<module>[1099]:0x3d931)
           at rust_begin_unwind (<module>[1051]:0x3c101)
           at core::panicking::panic_fmt::hfa15f5472ef5e557 (<module>[1428]:0x4aed4)
           at core::panicking::assert_failed_inner::h1ff1547b4e20ab23 (<module>[1442]:0x4baf6)
           at core::panicking::assert_failed::h4a336faee37010c7 (<module>[968]:0x3786c)
           at runic_types::wasm32::copy_capability_data_to_buffer::h3df68d48e80f298c (<module>[225]:0x97da)
           at <runic_types::wasm32::sound::Sound<_> as runic_types::pipelines::Source>::generate::h4d22008f54108f0a (<module>[125]:0x578a)
           at microspeech::_manifest::{{closure}}::h31e7eddd059d1b5c (<module>[107]:0x4300)
           at <alloc::boxed::Box<F,A> as core::ops::function::FnMut<Args>>::call_mut::he6db486207a4a9a2 (<module>[55]:0x26c0)
           at _call (<module>[127]:0x5e9a)
    2: unreachable

However, if you update the capability and proc blocks in the Runefile to work with a I16[16000] (so our debug assertion is happy), the model now thinks our "yes" example is "silence" and tests start failing. I'm guessing all those trailing zeroes make a difference once they go through the FFT.

@meelislootus do you have any idea what's going on here?

Michael-F-Bryan commented 3 years ago

@meelislootus fixed this in #152.