mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
24.73k stars 3.92k forks source link

Make model convertible by CoreML #642

Open MatthewWaller opened 7 years ago

MatthewWaller commented 7 years ago

It would be wonderful if DeepSpeech models could be converted to CoreML, for offline use in apps. Here is documentation to do just that. https://developer.apple.com/documentation/coreml/converting_trained_models_to_core_ml Thanks!

kdavis-mozilla commented 7 years ago

@MatthewWaller It doesn't appear as if TenforFlow is supported.

MatthewWaller commented 7 years ago

Hmmm, so I would probably need to write a custom conversion tool, like it says at the bottom of the page, I guess.

kdavis-mozilla commented 7 years ago

@MatthewWaller I'd guess so. Which seems like a large outlay in time and knowledge.

MatthewWaller commented 6 years ago

Just as an update, I'm examining this Keras model converter script, which comes from Apple's own Python CoreML tools. Could be a good precedent for defining the needed layers. You're right though @kdavis-mozilla, looks like a large project.

kdavis-mozilla commented 6 years ago

@MatthewWaller Thanks for the update!

lissyx commented 6 years ago

@MatthewWaller Have you been able to make any progress ?

MatthewWaller commented 6 years ago

I think it might be possible to convert with a third party tool. I haven’t written the python conversion scripts myself, but this could be useful (https://github.com/Microsoft/MMdnn/blob/master/README.md). But that’s only half the battle. Then I need to find out how to preprocess the audio, so I’m trying to find out how to get MFCC in Swift. One developer used a C library to do this in iOS, so that might be the way to go.

lissyx commented 6 years ago

@MatthewWaller I lack context here, but we have MFCC computation in C already, can't you leverage that?

MatthewWaller commented 6 years ago

If I want to use any C libraries I have to port them over to objective c or swift to use them in iOS or macOS. And that’s something I haven’t done yet, and I would prefer to do the calculations all in swift, for longevity sake

lissyx commented 6 years ago

@MatthewWaller I came accross https://github.com/tf-coreml/tf-coreml while looking at some tensorflow lite stuff, isn't it already addressing what you want to do?

MatthewWaller commented 6 years ago

It does! I hadn’t seen that one. Well, hopefully we just need to get the MFCC one way or another. I’ve got a couple of projects in the hopper before I get back o this one, but that’s exciting!

MatthewWaller commented 6 years ago

@lissyx I managed to feed mfcc into a core data model, but I'm not sure where to go to implement the link you sent to convert to coreml, specifically, I'm not sure where to find a list of output tensor names present in the TF graph, (the README.md gives an example of output_feature_names = ['softmax:0'])

Any ideas? Would welcome your help as well @kdavis-mozilla !

lissyx commented 6 years ago

Hm I remember documenting that to someone else needing to access some intermediate tensor, on discourse. You should have a look there, I cannot search for it for the moment, I'll try and find it tomorrow if you don't find :-)

lissyx commented 5 years ago

@MatthewWaller Any news on that ? The upcoming #1463 might benefit from such support

MatthewWaller commented 5 years ago

@lissyx unfortunately I haven't been able to convert to CoreML. The https://github.com/tf-coreml/tf-coreml, which Apple also recommends officially, cannot handle cycles. I tried and got the error, and as a limitation is states: "TF graph must be cycle free (cycles are generally created due to control flow ops like if, while, map, etc.)"

Not sure how to get around this at present. You can see my issue here: https://github.com/tf-coreml/tf-coreml/issues/124

The author states: "I think the simplest way to deal with such graphs for now is to abstract the weight matrices and bias vectors from pre-trained TF. And then use them to build a CoreML model directly using the neural network builder API provided by coremltools." But I'm not sure how to practically go about that.

kdavis-mozilla commented 5 years ago

@MatthewWaller Did you try CoreML on the PR or on master?

I think some, maybe all, cycles should be removed in the PR.

MatthewWaller commented 5 years ago

I tried an earlier version on master. Is there a pre-trained model I could use? I see an alpha in the release from 3 days ago. Would that work?

kdavis-mozilla commented 5 years ago

@reuben Can you give @MatthewWaller a preliminary model for the PR to test CoreML?

lissyx commented 5 years ago

@MatthewWaller The alpha release is only for the inference binaries, so far, it does not bundle any model change.

reuben commented 5 years ago

@MatthewWaller a preliminary model can be found here: https://github.com/reuben/DeepSpeech/releases/tag/v0.0.1-alpha

wshamp commented 5 years ago

@MatthewWaller Were you ever able to find the output_feature_names?

MatthewWaller commented 5 years ago

@wshamp I was. I found them to be 'logits:0'. As an update overall, I got the model, but I'm stumped at FailedPreconditionError. Here is the issue I filed with tf-coreml. The full stack trace and my full code for converting is there so far. I haven't heard back yet, but anyone else can troubleshoot as well :)

wshamp commented 5 years ago

Hmm my quick google that error seems to indicate an issue with the graph initializing variables not the converter. I hit the same error.

kdavis-mozilla commented 5 years ago

@MatthewWaller The branch stores the decoder state in the graph in the variables previous_state_c and previous_state_h. It's a convenient place to store this state info.

As far as I understand, @reuben correct me if I'm mistaken, in exporting[1] the graph the previous_state_c and previous_state_h should be removed[2] or at least not included.

Maybe the model @reuben provided mistakenly included previous_state_c and previous_state_h?

reuben commented 5 years ago

That blacklist doesn't remove previous_state_{c,h}, but rather makes the freezing process ignore them, since I want them to be variables (not constants) in the final exported graph.

The idea is that before you start feeding audio features and fetching the logits tensor, you have to run the initialize_state op (see the create_inference_graph function in DeepSpeech.py[1]).

In our C++ code we do it inside DS_SetupStream (deepspeech.cc[2]).

[1] https://github.com/mozilla/DeepSpeech/blob/7b873365f8bfffe2ea84dcd34058b537e9095765/DeepSpeech.py#L1718-L1756 [2] https://github.com/mozilla/DeepSpeech/blob/7b873365f8bfffe2ea84dcd34058b537e9095765/native_client/deepspeech.cc#L567

reuben commented 5 years ago

Some other notes: the graph in that URL uses LSTMBlockFusedCell, which is probably not supported by tf-coreml, but the weights are compatible with a normal LSTMCell, so with a bit of massaging on the saver when importing, you can use a static_rnn + LSTMCell.

If you can't workaround the previous_state_{c,h} thing, an alternative is fetching the state and feeding it back every time, eliminating the need for the variable.

static_rnn uses tf.cond OPs when you specify the sequence lengths. If tf.cond OPs are not supported by CoreML, you could try not passing sequence lengths to the RNN. It'll degrade the accuracy, but maybe only by a bit.

Let me know if you run into any other issues.

lissyx commented 5 years ago

@MatthewWaller We now have TF Lite support, can it be helpful?

MatthewWaller commented 5 years ago

For sure @lissyx ! Here is the official Google page about being able to convert Tensorflow Lite to CoreML.

kdavis-mozilla commented 5 years ago

@MatthewWaller I think you forgot to add the link.

MatthewWaller commented 5 years ago

Oops, Yep. Here it is @lissyx and @kdavis-mozilla https://developers.googleblog.com/2017/12/announcing-core-ml-support.html

wshamp commented 5 years ago

Have any of you attempted a CoreML conversion yet?

kdavis-mozilla commented 5 years ago

I've not. Maybe @lissyx has?

lissyx commented 5 years ago

Let's try?

lissyx commented 5 years ago

Well, except I have no iOS device to test that after :)

fotiDim commented 5 years ago

I can beta test for you :)

On Wed 20. Feb 2019 at 13:53, lissyx notifications@github.com wrote:

Well, except I have no iOS device to test that after :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/642#issuecomment-465561846, or mute the thread https://github.com/notifications/unsubscribe-auth/ACN_j4eIvXuEFfW4SBqbUTI3_itCV71Bks5vPUVkgaJpZM4OAOny .

lissyx commented 5 years ago

E Unsupported Ops of type: Unpack

:'(

lissyx commented 5 years ago

Might be similar requirements there are on the Android NNAPI

lissyx commented 5 years ago

So, contrary to Android, we can use StridedSlice, but then it fails:

[...]
131/402: Analysing op name: previous_state_h ( type:  Placeholder )
Skipping name of placeholder
132/402: Analysing op name: previous_state_c ( type:  Placeholder )
Skipping name of placeholder
133/402: Analysing op name: input_node ( type:  Placeholder )
Skipping name of placeholder
134/402: Analysing op name: transpose ( type:  Transpose )
Traceback (most recent call last):
  File "DeepSpeech.py", line 971, in <module>
    tf.app.run(main)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "DeepSpeech.py", line 964, in main
    export()
  File "DeepSpeech.py", line 855, in export
    'previous_state_h:0': [1,2048],
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_tf_coreml_converter.py", line 586, in convert
    custom_conversion_functions=custom_conversion_functions)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_tf_coreml_converter.py", line 337, in _convert_pb_to_mlmodel
    convert_ops_to_layers(context)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_ops_to_layers.py", line 178, in convert_ops_to_layers
    translator(op, context)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_layers.py", line 992, in transpose
    assert axes[0] == 0, "only works for 4D tensor without batch axis"
AssertionError: only works for 4D tensor without batch axis
lissyx commented 5 years ago

tentative_coreml.txt

lissyx commented 5 years ago

Removing transpose and using input_reshaped:0 as input node yields:

AssertionError: Strided Slice case not handled. Input shape = [16, 1, 2048], output shape = [1, 2048]
MatthewWaller commented 5 years ago

Hi @lissyx, I've started using a beta version of the Tensorflow to CoreML Converter that was announced today. Is there a way to get ahold of the TensorFlow Lite version of the .pb file? They have a tone of new layers and such that could help.

lissyx commented 5 years ago

Yes, just --export_dir path/to/export --export_tflite

fotiDim commented 5 years ago

"DeepSpeech" was spotted on one of the slides in the WWDC 2019 - Platforms State of the Union. I believe there are no blockers anymore.

On Wed, Jun 5, 2019, 8:16 AM lissyx notifications@github.com wrote:

Yes, just --export_dir path/to/export --export_tflite

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/642?email_source=notifications&email_token=AARX7DZAIFL27IS4LSF7A63PY5K4XA5CNFSM4DQA5HZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW6WVIA#issuecomment-498952864, or mute the thread https://github.com/notifications/unsubscribe-auth/AARX7D2MH73A3VFTP673T2LPY5K4XANCNFSM4DQA5HZA .

kdavis-mozilla commented 5 years ago

@fotiDim Do you have a link or screen shot?

fotiDim commented 5 years ago

@kdavis-mozilla Yep! Correction, was in the Platforms State of the Union presentation (at 1:21:55). Screenshot 2019-06-05 at 11 32 17 Screenshot 2019-06-05 at 11 32 21

iOS 13 also does offline speech recognition so perhaps they are using DeepSpeech under the hood now. Otherwise why put it on the screen?

MatthewWaller commented 5 years ago

@fotiDim @kdavis-mozilla I tried using their software to convert DeepSpeech here from 0.41 release and it failed (there is a new tfconverter) so maybe Apple ran their own version of Baidu’s architecture. Haven’t tried converting tflite though.

MatthewWaller commented 5 years ago

@lissyx I'm getting word that "ds_ctcdecoder-0.4.1-cp27-cp27mu-macosx_10_10_x86_64.whl is not a supported wheel on this platform." when trying to get DeepSpeech running. Any thoughts? Or alternatively, I could accept the already exported TFLite model and try to convert it. Would be great to get DeepSpeech up and running on this laptop though.

lissyx commented 5 years ago

@lissyx I'm getting word that "ds_ctcdecoder-0.4.1-cp27-cp27mu-macosx_10_10_x86_64.whl is not a supported wheel on this platform." when trying to get DeepSpeech running. Any thoughts? Or alternatively, I could accept the already exported TFLite model and try to convert it. Would be great to get DeepSpeech up and running on this laptop though.

Can you share more verbose pip install steps? Can you make sure your pip is recent enough ?

lissyx commented 5 years ago

@MatthewWaller In case it's a bug in selecting matching package, you can try others from https://tools.taskcluster.net/index/project.deepspeech.deepspeech.native_client.v0.4.1/osx-ctc

MatthewWaller commented 5 years ago

@lissyx getting closer. I.used ds_ctcdecoder-0.4.1-cp27-cp27m-macosx_10_10_x86_64.whl and this seems to work.

I'm working with the 0.4.1 release. I downloaded the checkpoint and the source code for that release.

To export, I use ./DeepSpeech.py --checkpoint_dir deepspeech-0.4.1-checkpoint/ --nouse_seq_length --export_tflite --export_dir ./

But this fails at def preprocess(csv_files, batch_size, numcep, numcontext, alphabet, hdf5_cache_path=None): in the preprocess.py because my csv_files are blank. That comes from FLAGS.train_cached_features_path being blank for line 388 of DeepSpeech.py