Open reuben opened 4 years ago
It looks like building a fat binary for x86_64 and arm64 is not useful because you still need two separate frameworks for device and simulator, as they use different SDKs.
I've edited the first comment to reflect this.
Some initial tests on device (iPhone Xs), averaged across 3 runs, with the 0.7.4 models:
no scorer, cold cache: RTF 0.60x no scorer, warm cache: RTF 0.48x with scorer, cold cache: RTF 0.55x with scorer, warm cache: RTF 0.24x
@lissyx could you give me a quick overview of what it would take to test changes to macOS workers in an isolated environment? Like, say, adding a macos-heavy-b worker type and spawning tasks with it to test.
My probably incomplete idea:
My probably incomplete idea:
1. Make PR against https://github.com/mozilla/community-tc-config/blob/master/config/projects/deepspeech.yml adding new worker instance with -b type. 2. Wait for it to be landed and deployed. 3. Make a copy of one of the existing worker images, change the worker type, make other modifications, start VM. 4. Spawn tasks against new worker type.
You'd have to use the script that is on the worker #1 prov.sh
, and you'd have to update the base image prior to that because I have not done it:
generic-worker
versiontaskclusterProxy
generic-worker.json
config updateWe don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...
Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.
You'd have to use the script that is on the worker #1
prov.sh
, and you'd have to update the base image prior to that because I have not done it:* `generic-worker` version * `taskclusterProxy` * `generic-worker.json` config update
I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.
We don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...
Yeah. I thought about making a VM copy of a worker to side-step these provisioning issues but I guess that's also prone to causing problems.
Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.
I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.
I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.
Indeed, you can fetch the json config. IPs on matrix.
I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.
Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro
Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro
Ah, yes, that was also one of the complications. OK, thanks.
(Hopefully) finished wrapping the C API, now moving on to CI work. Wrapper is here if anyone wants to take a quick look and provide any suggestions: https://github.com/mozilla/DeepSpeech/blob/ios-build/native_client/swift/deepspeech_ios/DeepSpeech.swift
In particular I'd be very interested in any feedback from Swift developers on how the error handling looks and how the buffer handling around Model.speechToText
and Stream.feedAudioContent
looks.
PR for adding macos-heavy-b and macos-light-b worker type and instances: https://github.com/mozilla/community-tc-config/pull/308
Looks like adding Xcode to the worker brings the free space to under 10GB which stops the taskcluster client from picking up any jobs... Resizing the partition does not seem to work. My next step will be to create a worker from scratch with a larger disk image.
@dabinat tagging you because you mentioned interest in these bindings in the CoreML issue, in case you have anything to mention regarding the design of the bindings here.
@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.
@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.
on taskcluster, if you browse to the iOS artifacts sections you should get it
Thank you! I'll try it out!
I tried it out with the current 0.7.4 models from the release page and one of the audio files from there.
TensorFlow: v2.2.0-17-g0854bb5188
DeepSpeech: v0.9.0-alpha.2-34-gdd20d35c
2020-07-17 09:48:31.981587-0700 deepspeech_ios_test[9411:91040] Initialized TensorFlow Lite runtime.
/private/var/containers/Bundle/Application/F5F8492E-D9B8-4BC4-AF46-29CEB23FC3A6/deepspeech_ios_test.app/4507-16021-0012.wav
read 8192 samples
(lldb)
And then comes this error
Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)
in this line of DeepSpeech.swift
178 public func feedAudioContent(buffer: UnsafeBufferPointer<Int16>) {
180 precondition(streamCtx != nil, "calling method on invalidated Stream")
181
182 DS_FeedAudioContent(streamCtx, buffer.baseAddress, UInt32(buffer.count)) <<<<< Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)
183 }
Just leaving this here, but I don't want to bother too much before this is even called finished :D
Update: A sorry I saw the version of DeepSpeech and I guess the models are not compatible.
The models should be compatible. I don't know what's going on there, can't reproduce it locally...
Does the log have any more details for this error?
Also, maybe double check the signing options in Xcode? At some point when writing the bindings I ran into some runtime exceptions due to incorrect signing options.
You're right. Since it crashes when communication with library happens, it's probably just included incorrectly.
So to go through what I tried:
deepspeech_ios
and deepspeech_ios_test
to my team and adjust the bundle identifier to one of mineclang: error: no such file or directory: '[...]/DeepSpeech/native_client/swift/libdeepspeech.so'
Command Ld failed with a nonzero exit code
dyld: Library not loaded: @rpath/deepspeech_ios.framework/deepspeech_ios
Referenced from: /private/var/containers/Bundle/Application/E9B900F3-5F4C-466D-BB03-E97F5588A768/deepspeech_ios_test.app/deepspeech_ios_test
Reason: image not found
(lldb) bt
The error in the post before is after trying some things in the "Frameworks, Libraries and Embedded Content Section". After doing a fresh start and just adding the library like explained above I get the following configs:
deepspeech_ios_test
target
deepspeech_ios
target
What do you have there? Maybe some things have to be switched to embed & sign?
After setting deepspeech_ios.framework
to Embed & Sign
in the deepspeech_ios_test
target (which is just a random tryout), the code at least passes until the DS_FeedAudioContent
, and the error occurs that I mentioned in the first post.
Here is the full log I got for that error.
* thread #2, queue = 'com.apple.avfoundation.avasset.completionsQueue', stop reason = EXC_BAD_ACCESS (code=2, address=0x130800076)
frame #0: 0x0000000103891494 libdeepspeech.so`___lldb_unnamed_symbol5$$libdeepspeech.so + 360
* frame #1: 0x000000010385df68 deepspeech_ios`DeepSpeechStream.feedAudioContent(buffer=(_position = 0x000000016502d200, count = 8192), self=(streamCtx = 0x0000000162f32f40)) at DeepSpeech.swift:181:9
frame #2: 0x00000001028b20b0 deepspeech_ios_test`closure #1 in render(samples=Swift.UnsafeRawBufferPointer @ 0x000000016d5e0100, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:126:20
frame #3: 0x00000001028b2138 deepspeech_ios_test`thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
frame #4: 0x00000001028b2198 deepspeech_ios_test`partial apply for thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
frame #5: 0x00000001b810c348 libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 504
frame #6: 0x00000001028b097c deepspeech_ios_test`render(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:124:22
frame #7: 0x00000001028b30c8 deepspeech_ios_test`closure #1 in test(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40), audioPath="/private/var/containers/Bundle/Application/D6D001A2-07F7-4BD3-80E9-9DBECCA975E8/deepspeech_ios_test.app/4507-16021-0012.wav", start=2020-07-19 21:19:15 CEST, completion=0x00000001028b83c8 deepspeech_ios_test`partial apply forwarder for closure #1 () -> () in closure #1 () -> () in deepspeech_ios_test.AppDelegate.application(_: __C.UIApplication, didFinishLaunchingWithOptions: Swift.Optional<Swift.Dictionary<__C.UIApplicationLaunchOptionsKey, Any>>) -> Swift.Bool at <compiler-generated>) at AppDelegate.swift:174:9
frame #8: 0x00000001028acfc0 deepspeech_ios_test`closure #1 in static AudioContext.load(asset=0x000000016365fd40, assetTrack=0x00000001637212f0, audioURL=Foundation.URL @ 0x0000000163674090, completionHandler=0x00000001028b375c deepspeech_ios_test`partial apply forwarder for closure #1 (Swift.Optional<deepspeech_ios_test.AudioContext>) -> () in deepspeech_ios_test.test(model: deepspeech_ios.DeepSpeechModel, audioPath: Swift.String, completion: () -> ()) -> () at <compiler-generated>) at AppDelegate.swift:59:17
frame #9: 0x00000001028ad9b0 deepspeech_ios_test`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
frame #10: 0x0000000102c0fefc libclang_rt.asan_ios_dynamic.dylib`__wrap_dispatch_async_block_invoke + 196
frame #11: 0x0000000104ec605c libdispatch.dylib`_dispatch_call_block_and_release + 32
frame #12: 0x0000000104ec74d8 libdispatch.dylib`_dispatch_client_callout + 20
frame #13: 0x0000000104ecec20 libdispatch.dylib`_dispatch_lane_serial_drain + 720
frame #14: 0x0000000104ecf834 libdispatch.dylib`_dispatch_lane_invoke + 440
frame #15: 0x0000000104edb270 libdispatch.dylib`_dispatch_workloop_worker_thread + 1344
frame #16: 0x00000001816a7718 libsystem_pthread.dylib`_pthread_wqthread + 276
I tried a simple test app where I loaded a pre-converted file of a few seconds into memory and called DeepSpeechModel.speechToText. There are no crashes or anything, but the resulting text string is empty.
It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?
It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?
That's correct, you don't need to setup a streaming context.
What do you have there? Maybe some things have to be switched to embed & sign?
I have libdeepspeech.so
in "Link Binary With Libraries" for project deepspeech_ios
. I don't see a "Frameworks and Libraries" section like in your screenshot.
In deepspeech_ios_test
, I have both libdeepspeech.so
and deepspeech_ios.framework
in "Link Binary With Libraries", and I also have libdeepspeech.so
in "Embed Frameworks", with "Code Sign On Copy" enabled.
Oh, OK, I was looking at the "Build Phases" tab, not "General". What I have in "General" matches your screenshots. It's weird that it crashes on DS_FeedAudioContent and not DS_CreateModel, maybe it's not a linking issue, unless your code has a bug and it's calling DS_FeedAudioContent first, on an invalid model/stream, which would certainly cause a segfault.
You're right, DS_CreateModel seems to work fine and I can retrieve sample rate and beam width. So the issue for me has to be in sampling and memory allocation. It's weird that that part is not working for me somehow.
Okay weirdly enough, when I run the app multiple times, every now and then it continues feeding audio content successfully
read 8192 samples
read 8192 samples
read 8192 samples
read 8192 samples
read 8192 samples
read 2800 samples
But it always fails when calling DS_FinishStream(streamCtx)
with a EXC_BAD_ACCESS
.
But every other time it fails feeding the first batch of samples.
Update: on DS_FinishStream(streamCtx)
I sometimes get an:
deepspeech_ios_test(26389,0x16fa6f000) malloc: can't allocate region
:*** mach_vm_map(size=11453251584, flags: 123) failed (error code=3)
deepspeech_ios_test(26389,0x16fa6f000) malloc: *** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
Out of memory maybe? What device are you running this on?
The iOS build has been added to CI in #3150. So we should now have automatically built binaries that can be used with the project. Next step is building and publishing the wrapper itself, but before doing that I want to get more feedback from users, such as these memory issues... (Thanks for testing @erksch !)
@reuben Hey just tried again from the current master, still the same. I am testing on brand new iPad Air with I think at least 4 GB RAM.
@reuben Btw, today I implemented microphone streaming on iOS with resampling to 16000Hz. I tested it with a different speech recognition but I think the code would transfer 100% to DeepSpeech. Do you think we could add this to the demo code?
@erksch I'm running on an iPhone Xs which is also 4GB of RAM, so that's weird.
Microphone streaming would be amazing to have! Can you open a PR?
Yep hopefully tomorrow :)
@erksch have you had a chance to work on the microphone streaming code? I would love to include that in 0.8 if possible :)
@reuben sorry for the absence, i was a little bit occupied. I'll make a PR today!
@reuben I noticed "/deepspeech_ios/libdeepspeech.dylib" in the .gitignore. Maybe that is what is missing for me? Where do I get this file?
@erksch that is a leftover from older code, I'll remove it. You should only need libdeepspeech.so
.
Here is the PR While file transcription does still not work, the microphone streaming actually works for me :D
Do you plan on publishing a Cocoapod for the framework portion of this? I'm going to build a React Native module for this but auto-linking requires a pod. Afterward, I can add instructions for users to drag the model file into the correct directory.
It's not just the models, you also need to add the shared library, libdeepspeech.so
, as a direct dependency of the app. Do you think it still makes sense to publish the framework on Cocoapods given this requirement? I've been hesitant to do that and then get tons of support requests from people with weird linker errors.
We could try to dlopen
the library and give a more descriptive message but I'm not sure how that fits with App Store guidelines.
I'll look into that. Is it not possible to bundle libdeepspeech.so
with the Cocoapod then use something like use_frameworks! ? I'm pretty sure Google does that with all of their pods. Loading dynamic code from the internet isn't allowed on the AppStore.
I'm not talking about loading dynamic code from the internet, I'm talking about having to add a shared library as a dependency or your app in Xcode. It'll then get codesigned and bundled with the application. This is because iOS does not allow linking against a framework that itself links against another shared library. Maybe I should look more seriously into static linking, which would make this way simpler.
Am 30.07.2020 um 22:28 schrieb zaptrem notifications@github.com:
I'll look into that. Loading dynamic code from the internet isn't allowed on the AppStore.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
If that won't require some massive overhaul it might be a good idea. Static frameworks are sometimes better for app load times than dynamic ones. Let me know how this goes, as switching from Apple's wildly unstable built-in transcription API to DeepSpeech for extremely long on-device transcription jobs is on our roadmap (after trying it on Android starting in a few days).
@reuben
Any change on this? I'm starting the Android half of this library tonight.
@reuben hey. Where can I find libdeepspeech.so for iOS, or how can I build it by myself? I looked at links you have provided long time ago but it sends me 404. Thanks in advance.
It's provided in the latest 0.8 or alpha 0.9 on github releases
@lissyx iOS build only in 0.9 alpha.
@lissyx iOS build only in 0.9 alpha.
It's always nice when you trust people that are working on the project.
WIP: Build works fine on latest master with some small modifications (DS branch ios-build):
Build for simulator x86_64:
Build for arm64:
Scope for 0.8:
Future scope: