Closed cameron-erdogan closed 5 years ago
The example app’s purpose is to show developers an example of how to use the library and provide a record of events during the course of a run—hence all output is directed to the debug window.
We are actively working on additional sample projects that will demonstrate more sophisticated integration of Spokestack than this built-in example app. An older but perhaps more helpful GUI example is available at https://github.com/pylon/spokestack-ios-example/tree/master/SpokeStackExample https://github.com/pylon/spokestack-ios-example/tree/master/SpokeStackExample.
On Oct 25, 2019, at 12:06 PM, Cameron Erdogan notifications@github.com wrote:
Is there an explanation for what "SpokeStackFrameworkExample" app is supposed to be demonstrating? I see the four options on the initial landing page, and then start/stop recording buttons on each detail page. It asks for microphone access and sometimes speech access, but otherwise nothing seems to happen. There are some debug messages depending on whether I'm running iOS 12 or 13, but it's usually just "didStart" and "didStop".
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKDF2JUKQYDV32JDBY3QQMKPNA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUNUECA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKCDLOP2V3G2XP4YNE3QQMKPNANCNFSM4JFFG3EQ.
I tried running that one too, but it won't compile (on XCode 10.1). I get an error on "import SpokeStack" that says "Missing required module 'googleapis'".
Anyway, I can still try to use the main project to try to understand how I might use this in an app I'm developing. Should I expect to see more than just "didInit" and "didStart"?
Anyway, I can still try to use the main project to try to understand how I might use this in an app I'm developing. Should I expect to see more than just "didInit" and "didStart"?
didInit
-> didStart
-> didActivate
(upon wakeword activation) -> didRecognize
(upon ASR recognition)
The wakeword depends on which wakeword view controller is chosen. TFLite is "Marvin", Apple and CoreML is "Up dog".
Better API documentation is on the way, but for now https://github.com/pylon/react-native-spokestack/#api can provide a rough guide. The example app's controllers contain all the possible events: https://github.com/pylon/spokestack-ios/blob/master/SpokeStackFrameworkExample/WakeWordViewController.swift#L115.
HTH!
Okay, that makes sense based on what I'm seeing in the code. I'm seeing bu didStart
can't get didActivate
to trigger after saying "Up dog". Is it necessary to press the start recording button? Either way it doesn't seem to help.
It’s necessary to press “start recording” button in the example app. I recommend experimenting with the “Apple Wakeword” option first, the debug console output from that will be more useful in understanding the API.
On Oct 25, 2019, at 4:08 PM, Cameron Erdogan notifications@github.com wrote:
Okay, that makes sense based on what I'm seeing in the code. I'm seeing bu didStart can't get didActivate to trigger after saying "Up dog". Is it necessary to press the start recording button? Either way it doesn't seem to help.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKDA2GCDNHQB6AXNFOTQQNG4LA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJNTKA#issuecomment-546494888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKE5ECW6IVJBFFAYTJDQQNG4LANCNFSM4JFFG3EQ.
Okay I can spend some more time with that. Thanks for your help so far.
Honestly I'm most interested with the VAD part of this code, since I haven't found a good iOS option for VAD. Any recommendations for using just that part in a separate module? I can also just ask that in a separate manner (not on this issue thread).
Oh interesting that VAD for iOS is of interest. I was in the same place as you and ended up putting a fair bit of work into porting Google’s WebRTC VAD into Swift and buildable via CocoaPods. You can check out the Swift wrapper https://github.com/pylon/spokestack-ios/blob/master/SpokeStack/WebRTCVAD.swift https://github.com/pylon/spokestack-ios/blob/master/SpokeStack/WebRTCVAD.swift and the WebRTC audio port https://github.com/pylon/filter_audio https://github.com/pylon/filter_audio.
On Oct 25, 2019, at 4:27 PM, Cameron Erdogan notifications@github.com wrote:
Okay I can spend some more time with that. Thanks for your help so far.
Honestly I'm most interested with the VAD part of this code, since I haven't found a good iOS option for VAD. Any recommendations for using just that part in a separate module? I can also just ask that in a separate manner (not on this issue thread).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKFGWOBFPPJXI4TIH7LQQNJERA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJPEFY#issuecomment-546501143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKCLF7TPTZH5FXY3KLLQQNJERANCNFSM4JFFG3EQ.
Yep I saw that. Seems to be exactly what I'm looking for. Of course, I'm having some trouble building.
My caveat is that my current deployment process uses Carthage instead of CocoaPods. So, I'm trying to include the relevant code by using the filter_audio framework from the example project's Pods > Frameworks folder and by directly including the relevant Swift wrapper code directly in my project. Things are working mostly okay, except I get a few Undefined Symbol: _WebRtcVad_Process
errors when trying to build. I'm learning about including C projects as I go here, so I'm in slightly over my head.
EDIT: That error only happened when I was building for the simulator. When I build directly to device the issue disappears.
If I fail at this for a few more hours I may give up this path and try again just using CocoaPods.
I get a few Undefined Symbol: _WebRtcVad_Process errors when trying to build.
This would indicate that the webrtc_vad.h
header isn’t visible to the build path. I don’t know how Carthage works, but the general idea is to make sure that ld
and clang
are getting a good header search path to find the filter_audio
headers, like https://github.com/pylon/filter_audio/blob/cocoapods/filter_audio.podspec#L18 https://github.com/pylon/filter_audio/blob/cocoapods/filter_audio.podspec#L18
I should clarify, I'm not even using Carthage for this, I just have a vanilla project (not workspace) with filter_audio included as a regular framework. I only mentioned Carthage to explain my reluctance to include the framework with CocoaPods.
I seem to have gotten it to build, so I'll update you once I hook up some audio.
After playing around with the example more: in the Apple Wakeword example, it seems the result of the WebRtcVad_Process
in WebRTCVAD.swift is always either 0 (None) or 1 (Uncertain). The breakdown depends on the mode setting. In the "aggressive" mode, it distinguishes pretty well between noise and silence. It doesn't seem to differentiate between non-vocal noise and vocal noise, though. Is this expected? Or is it unusual for the detector to have such apparent low certainty?
Sorry, thought I responded earlier! The vocal-vs-nonvocal is due to how a VAD works—it's purely dependent on frequencies in the vocal range.
Closing this issue, but feel free to comment if you have any more questions!
Is there an explanation for what "SpokeStackFrameworkExample" app is supposed to be demonstrating? I see the four options on the initial landing page, and then start/stop recording buttons on each detail page. It asks for microphone access and sometimes speech access, but otherwise nothing seems to happen. There are some debug messages depending on whether I'm running iOS 12 or 13, but it's usually just "didStart" and "didStop".