mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.03k stars 3.94k forks source link

Swift/iOS wrapper for TFLite libdeepspeech #3061

Open reuben opened 4 years ago

reuben commented 4 years ago

WIP: Build works fine on latest master with some small modifications (DS branch ios-build):

Build for simulator x86_64:

$ bazel build --verbose_failures --config=ios_x86_64 --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt //native_client:libdeepspeech.so --define=runtime=tflite --copt=-DTFLITE_WITH_RUY_GEMV
$ cp -f bazel-bin/native_client/libdeepspeech.so ../native_client/swift/libdeepspeech.so

Build for arm64:

$ bazel build --verbose_failures --config=ios_arm64 --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt //native_client:libdeepspeech.so --define=runtime=tflite --copt=-DTFLITE_WITH_RUY_GEMV
$ cp -f bazel-bin/native_client/libdeepspeech.so ../native_client/swift/libdeepspeech.so

Scope for 0.8:

Future scope:

reuben commented 4 years ago

It looks like building a fat binary for x86_64 and arm64 is not useful because you still need two separate frameworks for device and simulator, as they use different SDKs.

I've edited the first comment to reflect this.

reuben commented 4 years ago

Some initial tests on device (iPhone Xs), averaged across 3 runs, with the 0.7.4 models:

no scorer, cold cache: RTF 0.60x no scorer, warm cache: RTF 0.48x with scorer, cold cache: RTF 0.55x with scorer, warm cache: RTF 0.24x

reuben commented 4 years ago

@lissyx could you give me a quick overview of what it would take to test changes to macOS workers in an isolated environment? Like, say, adding a macos-heavy-b worker type and spawning tasks with it to test.

reuben commented 4 years ago

My probably incomplete idea:

  1. Make PR against https://github.com/mozilla/community-tc-config/blob/master/config/projects/deepspeech.yml adding new worker instance with -b type.
  2. Wait for it to be landed and deployed.
  3. Make a copy of one of the existing worker images, change the worker type, make other modifications, start VM.
  4. Spawn tasks against new worker type.
lissyx commented 4 years ago

My probably incomplete idea:

1. Make PR against https://github.com/mozilla/community-tc-config/blob/master/config/projects/deepspeech.yml adding new worker instance with -b type.

2. Wait for it to be landed and deployed.

3. Make a copy of one of the existing worker images, change the worker type, make other modifications, start VM.

4. Spawn tasks against new worker type.

You'd have to use the script that is on the worker #1 prov.sh, and you'd have to update the base image prior to that because I have not done it:

We don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...

Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.

reuben commented 4 years ago

You'd have to use the script that is on the worker #1 prov.sh, and you'd have to update the base image prior to that because I have not done it:

* `generic-worker` version

* `taskclusterProxy`

* `generic-worker.json` config update

I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.

We don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...

Yeah. I thought about making a VM copy of a worker to side-step these provisioning issues but I guess that's also prone to causing problems.

Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.

I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.

lissyx commented 4 years ago

I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.

Indeed, you can fetch the json config. IPs on matrix.

I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.

Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro

reuben commented 4 years ago

Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro

Ah, yes, that was also one of the complications. OK, thanks.

reuben commented 4 years ago

(Hopefully) finished wrapping the C API, now moving on to CI work. Wrapper is here if anyone wants to take a quick look and provide any suggestions: https://github.com/mozilla/DeepSpeech/blob/ios-build/native_client/swift/deepspeech_ios/DeepSpeech.swift

reuben commented 4 years ago

In particular I'd be very interested in any feedback from Swift developers on how the error handling looks and how the buffer handling around Model.speechToText and Stream.feedAudioContent looks.

reuben commented 4 years ago

PR for adding macos-heavy-b and macos-light-b worker type and instances: https://github.com/mozilla/community-tc-config/pull/308

reuben commented 4 years ago

Looks like adding Xcode to the worker brings the free space to under 10GB which stops the taskcluster client from picking up any jobs... Resizing the partition does not seem to work. My next step will be to create a worker from scratch with a larger disk image.

reuben commented 4 years ago

@dabinat tagging you because you mentioned interest in these bindings in the CoreML issue, in case you have anything to mention regarding the design of the bindings here.

erksch commented 4 years ago

@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.

lissyx commented 4 years ago

@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.

on taskcluster, if you browse to the iOS artifacts sections you should get it

lissyx commented 4 years ago

e.g., https://community-tc.services.mozilla.com/tasks/BO-_Adi2Th2DOswsw9m1Kw#artifacts

erksch commented 4 years ago

Thank you! I'll try it out!

erksch commented 4 years ago

I tried it out with the current 0.7.4 models from the release page and one of the audio files from there.

TensorFlow: v2.2.0-17-g0854bb5188
DeepSpeech: v0.9.0-alpha.2-34-gdd20d35c
2020-07-17 09:48:31.981587-0700 deepspeech_ios_test[9411:91040] Initialized TensorFlow Lite runtime.
/private/var/containers/Bundle/Application/F5F8492E-D9B8-4BC4-AF46-29CEB23FC3A6/deepspeech_ios_test.app/4507-16021-0012.wav
read 8192 samples
(lldb)

And then comes this error

Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)

in this line of DeepSpeech.swift

178    public func feedAudioContent(buffer: UnsafeBufferPointer<Int16>) {
180        precondition(streamCtx != nil, "calling method on invalidated Stream")
181        
182        DS_FeedAudioContent(streamCtx, buffer.baseAddress, UInt32(buffer.count)) <<<<< Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)
183   }

Just leaving this here, but I don't want to bother too much before this is even called finished :D

Update: A sorry I saw the version of DeepSpeech and I guess the models are not compatible.

reuben commented 4 years ago

The models should be compatible. I don't know what's going on there, can't reproduce it locally...

reuben commented 4 years ago

Does the log have any more details for this error?

reuben commented 4 years ago

Also, maybe double check the signing options in Xcode? At some point when writing the bindings I ran into some runtime exceptions due to incorrect signing options.

erksch commented 4 years ago

You're right. Since it crashes when communication with library happens, it's probably just included incorrectly.

So to go through what I tried:

The error in the post before is after trying some things in the "Frameworks, Libraries and Embedded Content Section". After doing a fresh start and just adding the library like explained above I get the following configs:

What do you have there? Maybe some things have to be switched to embed & sign?

erksch commented 4 years ago

After setting deepspeech_ios.framework to Embed & Sign in the deepspeech_ios_test target (which is just a random tryout), the code at least passes until the DS_FeedAudioContent, and the error occurs that I mentioned in the first post.

Here is the full log I got for that error.

* thread #2, queue = 'com.apple.avfoundation.avasset.completionsQueue', stop reason = EXC_BAD_ACCESS (code=2, address=0x130800076)
    frame #0: 0x0000000103891494 libdeepspeech.so`___lldb_unnamed_symbol5$$libdeepspeech.so + 360
  * frame #1: 0x000000010385df68 deepspeech_ios`DeepSpeechStream.feedAudioContent(buffer=(_position = 0x000000016502d200, count = 8192), self=(streamCtx = 0x0000000162f32f40)) at DeepSpeech.swift:181:9
    frame #2: 0x00000001028b20b0 deepspeech_ios_test`closure #1 in render(samples=Swift.UnsafeRawBufferPointer @ 0x000000016d5e0100, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:126:20
    frame #3: 0x00000001028b2138 deepspeech_ios_test`thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4: 0x00000001028b2198 deepspeech_ios_test`partial apply for thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #5: 0x00000001b810c348 libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 504
    frame #6: 0x00000001028b097c deepspeech_ios_test`render(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:124:22
    frame #7: 0x00000001028b30c8 deepspeech_ios_test`closure #1 in test(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40), audioPath="/private/var/containers/Bundle/Application/D6D001A2-07F7-4BD3-80E9-9DBECCA975E8/deepspeech_ios_test.app/4507-16021-0012.wav", start=2020-07-19 21:19:15 CEST, completion=0x00000001028b83c8 deepspeech_ios_test`partial apply forwarder for closure #1 () -> () in closure #1 () -> () in deepspeech_ios_test.AppDelegate.application(_: __C.UIApplication, didFinishLaunchingWithOptions: Swift.Optional<Swift.Dictionary<__C.UIApplicationLaunchOptionsKey, Any>>) -> Swift.Bool at <compiler-generated>) at AppDelegate.swift:174:9
    frame #8: 0x00000001028acfc0 deepspeech_ios_test`closure #1 in static AudioContext.load(asset=0x000000016365fd40, assetTrack=0x00000001637212f0, audioURL=Foundation.URL @ 0x0000000163674090, completionHandler=0x00000001028b375c deepspeech_ios_test`partial apply forwarder for closure #1 (Swift.Optional<deepspeech_ios_test.AudioContext>) -> () in deepspeech_ios_test.test(model: deepspeech_ios.DeepSpeechModel, audioPath: Swift.String, completion: () -> ()) -> () at <compiler-generated>) at AppDelegate.swift:59:17
    frame #9: 0x00000001028ad9b0 deepspeech_ios_test`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
    frame #10: 0x0000000102c0fefc libclang_rt.asan_ios_dynamic.dylib`__wrap_dispatch_async_block_invoke + 196
    frame #11: 0x0000000104ec605c libdispatch.dylib`_dispatch_call_block_and_release + 32
    frame #12: 0x0000000104ec74d8 libdispatch.dylib`_dispatch_client_callout + 20
    frame #13: 0x0000000104ecec20 libdispatch.dylib`_dispatch_lane_serial_drain + 720
    frame #14: 0x0000000104ecf834 libdispatch.dylib`_dispatch_lane_invoke + 440
    frame #15: 0x0000000104edb270 libdispatch.dylib`_dispatch_workloop_worker_thread + 1344
    frame #16: 0x00000001816a7718 libsystem_pthread.dylib`_pthread_wqthread + 276
dabinat commented 4 years ago

I tried a simple test app where I loaded a pre-converted file of a few seconds into memory and called DeepSpeechModel.speechToText. There are no crashes or anything, but the resulting text string is empty.

It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?

reuben commented 4 years ago

It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?

That's correct, you don't need to setup a streaming context.

reuben commented 4 years ago

What do you have there? Maybe some things have to be switched to embed & sign?

I have libdeepspeech.so in "Link Binary With Libraries" for project deepspeech_ios. I don't see a "Frameworks and Libraries" section like in your screenshot.

In deepspeech_ios_test, I have both libdeepspeech.so and deepspeech_ios.framework in "Link Binary With Libraries", and I also have libdeepspeech.so in "Embed Frameworks", with "Code Sign On Copy" enabled.

reuben commented 4 years ago

Oh, OK, I was looking at the "Build Phases" tab, not "General". What I have in "General" matches your screenshots. It's weird that it crashes on DS_FeedAudioContent and not DS_CreateModel, maybe it's not a linking issue, unless your code has a bug and it's calling DS_FeedAudioContent first, on an invalid model/stream, which would certainly cause a segfault.

erksch commented 4 years ago

You're right, DS_CreateModel seems to work fine and I can retrieve sample rate and beam width. So the issue for me has to be in sampling and memory allocation. It's weird that that part is not working for me somehow.

erksch commented 4 years ago

Okay weirdly enough, when I run the app multiple times, every now and then it continues feeding audio content successfully

read 8192 samples
read 8192 samples
read 8192 samples
read 8192 samples
read 8192 samples
read 2800 samples

But it always fails when calling DS_FinishStream(streamCtx) with a EXC_BAD_ACCESS.

But every other time it fails feeding the first batch of samples.

Update: on DS_FinishStream(streamCtx) I sometimes get an:

deepspeech_ios_test(26389,0x16fa6f000) malloc: can't allocate region
:*** mach_vm_map(size=11453251584, flags: 123) failed (error code=3)
deepspeech_ios_test(26389,0x16fa6f000) malloc: *** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
reuben commented 4 years ago

Out of memory maybe? What device are you running this on?

reuben commented 4 years ago

The iOS build has been added to CI in #3150. So we should now have automatically built binaries that can be used with the project. Next step is building and publishing the wrapper itself, but before doing that I want to get more feedback from users, such as these memory issues... (Thanks for testing @erksch !)

erksch commented 4 years ago

@reuben Hey just tried again from the current master, still the same. I am testing on brand new iPad Air with I think at least 4 GB RAM.

erksch commented 4 years ago

@reuben Btw, today I implemented microphone streaming on iOS with resampling to 16000Hz. I tested it with a different speech recognition but I think the code would transfer 100% to DeepSpeech. Do you think we could add this to the demo code?

reuben commented 4 years ago

@erksch I'm running on an iPhone Xs which is also 4GB of RAM, so that's weird.

Microphone streaming would be amazing to have! Can you open a PR?

erksch commented 4 years ago

Yep hopefully tomorrow :)

reuben commented 4 years ago

@erksch have you had a chance to work on the microphone streaming code? I would love to include that in 0.8 if possible :)

erksch commented 4 years ago

@reuben sorry for the absence, i was a little bit occupied. I'll make a PR today!

erksch commented 4 years ago

@reuben I noticed "/deepspeech_ios/libdeepspeech.dylib" in the .gitignore. Maybe that is what is missing for me? Where do I get this file?

reuben commented 4 years ago

@erksch that is a leftover from older code, I'll remove it. You should only need libdeepspeech.so.

erksch commented 4 years ago

Here is the PR While file transcription does still not work, the microphone streaming actually works for me :D

zaptrem commented 4 years ago

Do you plan on publishing a Cocoapod for the framework portion of this? I'm going to build a React Native module for this but auto-linking requires a pod. Afterward, I can add instructions for users to drag the model file into the correct directory.

reuben commented 4 years ago

It's not just the models, you also need to add the shared library, libdeepspeech.so, as a direct dependency of the app. Do you think it still makes sense to publish the framework on Cocoapods given this requirement? I've been hesitant to do that and then get tons of support requests from people with weird linker errors.

We could try to dlopen the library and give a more descriptive message but I'm not sure how that fits with App Store guidelines.

zaptrem commented 4 years ago

I'll look into that. Is it not possible to bundle libdeepspeech.so with the Cocoapod then use something like use_frameworks! ? I'm pretty sure Google does that with all of their pods. Loading dynamic code from the internet isn't allowed on the AppStore.

reuben commented 4 years ago

I'm not talking about loading dynamic code from the internet, I'm talking about having to add a shared library as a dependency or your app in Xcode. It'll then get codesigned and bundled with the application. This is because iOS does not allow linking against a framework that itself links against another shared library. Maybe I should look more seriously into static linking, which would make this way simpler.

Am 30.07.2020 um 22:28 schrieb zaptrem notifications@github.com:

 I'll look into that. Loading dynamic code from the internet isn't allowed on the AppStore.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

zaptrem commented 4 years ago

If that won't require some massive overhaul it might be a good idea. Static frameworks are sometimes better for app load times than dynamic ones. Let me know how this goes, as switching from Apple's wildly unstable built-in transcription API to DeepSpeech for extremely long on-device transcription jobs is on our roadmap (after trying it on Android starting in a few days).

zaptrem commented 4 years ago

@reuben
Any change on this? I'm starting the Android half of this library tonight.

alexmay23 commented 4 years ago

@reuben hey. Where can I find libdeepspeech.so for iOS, or how can I build it by myself? I looked at links you have provided long time ago but it sends me 404. Thanks in advance.

lissyx commented 4 years ago

It's provided in the latest 0.8 or alpha 0.9 on github releases

alexmay23 commented 4 years ago

@lissyx iOS build only in 0.9 alpha.

lissyx commented 4 years ago

@lissyx iOS build only in 0.9 alpha.

It's always nice when you trust people that are working on the project.