spokestack / spokestack-ios

Spokestack: give your iOS app a voice interface!
https://spokestack.io
Apache License 2.0
42 stars 8 forks source link

Example app confusion #54

Closed cameron-erdogan closed 5 years ago

cameron-erdogan commented 5 years ago

Is there an explanation for what "SpokeStackFrameworkExample" app is supposed to be demonstrating? I see the four options on the initial landing page, and then start/stop recording buttons on each detail page. It asks for microphone access and sometimes speech access, but otherwise nothing seems to happen. There are some debug messages depending on whether I'm running iOS 12 or 13, but it's usually just "didStart" and "didStop".

noelweichbrodt commented 5 years ago

The example app’s purpose is to show developers an example of how to use the library and provide a record of events during the course of a run—hence all output is directed to the debug window.

We are actively working on additional sample projects that will demonstrate more sophisticated integration of Spokestack than this built-in example app. An older but perhaps more helpful GUI example is available at https://github.com/pylon/spokestack-ios-example/tree/master/SpokeStackExample https://github.com/pylon/spokestack-ios-example/tree/master/SpokeStackExample.

On Oct 25, 2019, at 12:06 PM, Cameron Erdogan notifications@github.com wrote:

Is there an explanation for what "SpokeStackFrameworkExample" app is supposed to be demonstrating? I see the four options on the initial landing page, and then start/stop recording buttons on each detail page. It asks for microphone access and sometimes speech access, but otherwise nothing seems to happen. There are some debug messages depending on whether I'm running iOS 12 or 13, but it's usually just "didStart" and "didStop".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKDF2JUKQYDV32JDBY3QQMKPNA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUNUECA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKCDLOP2V3G2XP4YNE3QQMKPNANCNFSM4JFFG3EQ.

cameron-erdogan commented 5 years ago

I tried running that one too, but it won't compile (on XCode 10.1). I get an error on "import SpokeStack" that says "Missing required module 'googleapis'".

Anyway, I can still try to use the main project to try to understand how I might use this in an app I'm developing. Should I expect to see more than just "didInit" and "didStart"?

noelweichbrodt commented 5 years ago

Anyway, I can still try to use the main project to try to understand how I might use this in an app I'm developing. Should I expect to see more than just "didInit" and "didStart"?

didInit -> didStart -> didActivate (upon wakeword activation) -> didRecognize (upon ASR recognition)

The wakeword depends on which wakeword view controller is chosen. TFLite is "Marvin", Apple and CoreML is "Up dog".

Better API documentation is on the way, but for now https://github.com/pylon/react-native-spokestack/#api can provide a rough guide. The example app's controllers contain all the possible events: https://github.com/pylon/spokestack-ios/blob/master/SpokeStackFrameworkExample/WakeWordViewController.swift#L115.

HTH!

cameron-erdogan commented 5 years ago

Okay, that makes sense based on what I'm seeing in the code. I'm seeing bu didStart can't get didActivate to trigger after saying "Up dog". Is it necessary to press the start recording button? Either way it doesn't seem to help.

noelweichbrodt commented 5 years ago

It’s necessary to press “start recording” button in the example app. I recommend experimenting with the “Apple Wakeword” option first, the debug console output from that will be more useful in understanding the API.

On Oct 25, 2019, at 4:08 PM, Cameron Erdogan notifications@github.com wrote:

Okay, that makes sense based on what I'm seeing in the code. I'm seeing bu didStart can't get didActivate to trigger after saying "Up dog". Is it necessary to press the start recording button? Either way it doesn't seem to help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKDA2GCDNHQB6AXNFOTQQNG4LA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJNTKA#issuecomment-546494888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKE5ECW6IVJBFFAYTJDQQNG4LANCNFSM4JFFG3EQ.

cameron-erdogan commented 5 years ago

Okay I can spend some more time with that. Thanks for your help so far.

Honestly I'm most interested with the VAD part of this code, since I haven't found a good iOS option for VAD. Any recommendations for using just that part in a separate module? I can also just ask that in a separate manner (not on this issue thread).

noelweichbrodt commented 5 years ago

Oh interesting that VAD for iOS is of interest. I was in the same place as you and ended up putting a fair bit of work into porting Google’s WebRTC VAD into Swift and buildable via CocoaPods. You can check out the Swift wrapper https://github.com/pylon/spokestack-ios/blob/master/SpokeStack/WebRTCVAD.swift https://github.com/pylon/spokestack-ios/blob/master/SpokeStack/WebRTCVAD.swift and the WebRTC audio port https://github.com/pylon/filter_audio https://github.com/pylon/filter_audio.

On Oct 25, 2019, at 4:27 PM, Cameron Erdogan notifications@github.com wrote:

Okay I can spend some more time with that. Thanks for your help so far.

Honestly I'm most interested with the VAD part of this code, since I haven't found a good iOS option for VAD. Any recommendations for using just that part in a separate module? I can also just ask that in a separate manner (not on this issue thread).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pylon/spokestack-ios/issues/54?email_source=notifications&email_token=AAA5IKFGWOBFPPJXI4TIH7LQQNJERA5CNFSM4JFFG3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJPEFY#issuecomment-546501143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5IKCLF7TPTZH5FXY3KLLQQNJERANCNFSM4JFFG3EQ.

cameron-erdogan commented 5 years ago

Yep I saw that. Seems to be exactly what I'm looking for. Of course, I'm having some trouble building.

My caveat is that my current deployment process uses Carthage instead of CocoaPods. So, I'm trying to include the relevant code by using the filter_audio framework from the example project's Pods > Frameworks folder and by directly including the relevant Swift wrapper code directly in my project. Things are working mostly okay, except I get a few Undefined Symbol: _WebRtcVad_Process errors when trying to build. I'm learning about including C projects as I go here, so I'm in slightly over my head.

EDIT: That error only happened when I was building for the simulator. When I build directly to device the issue disappears.

If I fail at this for a few more hours I may give up this path and try again just using CocoaPods.

noelweichbrodt commented 5 years ago

I get a few Undefined Symbol: _WebRtcVad_Process errors when trying to build.

This would indicate that the webrtc_vad.h header isn’t visible to the build path. I don’t know how Carthage works, but the general idea is to make sure that ld and clang are getting a good header search path to find the filter_audio headers, like https://github.com/pylon/filter_audio/blob/cocoapods/filter_audio.podspec#L18 https://github.com/pylon/filter_audio/blob/cocoapods/filter_audio.podspec#L18

cameron-erdogan commented 5 years ago

I should clarify, I'm not even using Carthage for this, I just have a vanilla project (not workspace) with filter_audio included as a regular framework. I only mentioned Carthage to explain my reluctance to include the framework with CocoaPods.

I seem to have gotten it to build, so I'll update you once I hook up some audio.

cameron-erdogan commented 5 years ago

After playing around with the example more: in the Apple Wakeword example, it seems the result of the WebRtcVad_Process in WebRTCVAD.swift is always either 0 (None) or 1 (Uncertain). The breakdown depends on the mode setting. In the "aggressive" mode, it distinguishes pretty well between noise and silence. It doesn't seem to differentiate between non-vocal noise and vocal noise, though. Is this expected? Or is it unusual for the detector to have such apparent low certainty?

noelweichbrodt commented 5 years ago

Sorry, thought I responded earlier! The vocal-vs-nonvocal is due to how a VAD works—it's purely dependent on frequencies in the vocal range.

Closing this issue, but feel free to comment if you have any more questions!