mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.34k stars 3.97k forks source link

Feature Request: tensorflow.js support #2233

Open beriberikix opened 5 years ago

beriberikix commented 5 years ago

Has the team looked into supporting Tensorflow.js? There is a tool and documentation to convert existing models to be compatible with the format needed by tfjs. I have use case that would benefit from running wav transcription example in the browser.

Note: I filed a similar feature request on tfjs project.

lissyx commented 5 years ago

Nothing stops you from experimenting. But we have not had time to try that.

lissyx commented 5 years ago

To be honest I fear we won't get any decent perfs with that, given the experiments without vectorization we could perform.

beriberikix commented 5 years ago

Understood, I hope to experiment and report back!

I watched a recent talk by two leads and I'm optimistic, at least of the future perf. Today they're using WebGL to get better speed than vanilla JS and in the future they'll be looking into WebAssembly with Threads and SIMD.

lissyx commented 5 years ago

Yeah, threads and SIMD that's still a long way, according to a colleague working on that and wasm. Still curious about what you can get.

lissyx commented 5 years ago

FTR @beriberikix I don't know if the TF.js converter has the same constraints as the one for leveraging EdgeTPU, but we cannot (yet) convert our model for running on EdgeTPU. It's not impossible they may share constraints.

beriberikix commented 5 years ago

I'm trying to convert deepspeech-0.5.1-models.tar.gz using tensorflow/tfjs-converter but I'm running into an issue (probably because I've never used the tool before!)

SavedModel file does not exist at: 
./deepspeech-0.5.1-models/output_graph.pb/{saved_model.pbtxt|saved_model.pb}

Do you know which, if any, signatures and/or tags were used in generating the SavedModel? Per the help output:

--signature_name SIGNATURE_NAME
                        Signature of the SavedModel Graph or TF-Hub module to
                        load. Applicable only if input format is "tf_hub" or
                        "tf_saved_model".
  --saved_model_tags SAVED_MODEL_TAGS
                        Tags of the MetaGraphDef to load, in comma separated
                        string format. Defaults to "serve". Applicable only if
                        input format is "tf_saved_model".
lissyx commented 5 years ago

You should have a look at the export function, maybe some parameters needs to be adjusted?

aptlin commented 5 years ago

@beriberikix, seems like the converter looks for saved_model.pb, but the graph is saved in output_graph.pb instead, so you might need to just rename this file. Other than that, here is a working example of how to convert SavedModel. The options

  --signature_name=serving_default \
  --saved_model_tags=serve

are merely default settings, so you might skip setting them if the model was saved without altering them.

beriberikix commented 5 years ago

Ah, I tried something else. Looking at the export function I noticed it was saved as a frozen model, which the latest converter no longer supports:

Note: Session bundle and Frozen model formats have been deprecated in TensorFlow.js 1.0. Please use the TensorFlow.js 0.15.x backend to convert these formats, available in tfjs-converter 0.8.6.

I downgraded to 0.8.6 and got further before another error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'feature_win_len' not in Op<name=NoOp; signature= -> >; NodeDef: {{node model_metadata}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Which appears to be related to the version of tf used to originally generate the model. I'll try reverting back to latest and renaming the file before trying to investigate the tf version path.

lissyx commented 5 years ago

tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'feature_win_len' not in Op<name=NoOp; signature= -> >; NodeDef: {{node model_metadata}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

That's just from some metadata that we add to the exported model. Try to disable that and re-export ?

lissyx commented 5 years ago

@beriberikix Have you been able to make any progress ?

beriberikix commented 5 years ago

Unfortunately not. I ran into a few issues and work got very busy lately. Hope to come back to but not in the near-term.

lissyx commented 5 years ago

Are those issues workable items we could help about?

alexcannan commented 4 years ago

@beriberikix Have you made any progress? I'm interested in this feature, wanted to check before I get started.

beriberikix commented 4 years ago

@alexcannan unfortunately I moved on. My project went in a different direction and I was a bit over my head to begin with. Would love to follow along with your progress!

alexcannan commented 4 years ago

For sake of documentation, I was able to set up the tfjs-converter package to attempt to process the output_graph.pb model, using the following command:

tensorflowjs_converter --input_format=tf_frozen_model --output_format=tensorflowjs --output_node_names="logits,new_state_c,new_state_h,mfccs,metadata_version,metadata_sample_rate,metadata_feature_win_len,metadata_feature_win_step,metadata_alphabet" output_graph.pb output_graph.tfjs

The output_node_names I was able to access during the export() function in DeepSpeech.py.

Unforunately, tfjs does not support certain operations to properly convert the existing model.

ValueError: Unsupported Ops in the model before optimization BlockLSTM, AudioSpectrogram, Mfcc

So until tfjs implements these ops, it looks like a simple tfjs conversion won't be possible. There has been movement recently to set up audio-related ops like stft, but it will take some development to get this working. If anyone is interested in contributing, check out this ticket to get an idea of the op development process.

lissyx commented 4 years ago

Unforunately, tfjs does not support certain operations to properly convert the existing model.

Thanks, sadly this is aligned with our experience on several other tools, EdgeTPU included :/

reuben commented 4 years ago

You could try exporting the TFLite model instead. It does not use BlockLSTM. You'll have to comment out the feature computation sub-graph (and figure out an alternative for computing MFCCs in JS), but maybe it's enough to make some progress.

timpulver commented 4 years ago

Meyda can be used to perform MFCC in JavaScript.

reuben commented 4 years ago

That seems to work, using the static_rnn RNN impl (like we do for TFLite exports), but without converting to TFLite. https://gist.github.com/reuben/4330b69db52112982c63aa8f98912c9f

Then:

tensorflowjs_converter --input_format=tf_frozen_model --output_format=tfjs_graph_model --output_node_names="logits,new_state_c,new_state_h,metadata_version,metadata_sample_rate,metadata_feature_win_len,metadata_feature_win_step,metadata_alphabet" ../tfjs_test/output_graph.pb output_graph.tfjs
2020-04-20 23:17:51.520207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-04-20 23:17:51.520231: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   debug_stripper: debug_stripper did nothing. time = 0.09ms.
2020-04-20 23:17:51.520236: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   model_pruner: Graph size after: 980 nodes (-46), 1671 edges (-44), time = 235.555ms.
2020-04-20 23:17:51.520240: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 980 nodes (0), 1671 edges (0), time = 1595.54199ms.
2020-04-20 23:17:51.520244: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 828 nodes (-152), 1494 edges (-177), time = 686.765ms.
2020-04-20 23:17:51.520248: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 793 nodes (-35), 1441 edges (-53), time = 54.915ms.
2020-04-20 23:17:51.520337: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 49.553ms.
2020-04-20 23:17:51.520349: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 755.521ms.
2020-04-20 23:17:51.520354: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 661.083ms.
2020-04-20 23:17:51.520359: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 54.496ms.
2020-04-20 23:17:51.520362: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   debug_stripper: debug_stripper did nothing. time = 13.755ms.
2020-04-20 23:17:51.520366: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 36.591ms.
2020-04-20 23:17:51.520528: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 762.896ms.
2020-04-20 23:17:51.520540: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 662.148ms.
2020-04-20 23:17:51.520544: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 55.646ms.
2020-04-20 23:17:51.520548: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 50.742ms.
2020-04-20 23:17:51.520552: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 754.065ms.
2020-04-20 23:17:51.520556: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 671.946ms.
2020-04-20 23:17:51.520559: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 54.968ms.
2020-04-20 23:17:55.483821: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-04-20 23:17:55.483844: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   remapper: Graph size after: 768 nodes (-25), 1416 edges (-25), time = 82.316ms.
2020-04-20 23:17:55.483849: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 768 nodes (0), 1416 edges (0), time = 791.112ms.
2020-04-20 23:17:55.483853: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 768 nodes (0), 1416 edges (0), time = 715.423ms.
2020-04-20 23:17:55.483857: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 768 nodes (0), 1416 edges (0), time = 56.041ms.
2020-04-20 23:17:55.483861: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   remapper: Graph size after: 768 nodes (0), 1416 edges (0), time = 96.295ms.
2020-04-20 23:17:55.483951: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 768 nodes (0), 1416 edges (0), time = 771.628ms.
2020-04-20 23:17:55.483964: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   arithmetic_optimizer: Graph size after: 768 nodes (0), 1416 edges (0), time = 712.75ms.
2020-04-20 23:17:55.483968: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   dependency_optimizer: Graph size after: 768 nodes (0), 1416 edges (0), time = 55.314ms.
Writing weight file output_graph.tfjs/model.json...
alexcannan commented 4 years ago

@reuben What version of tensorflowjs are you using? It doesn't look like the recommended 0.8.6 version, otherwise it wouldn't let you use --output_format=tfjs_graph_model.

I was able to build a basic output_graph.pb using reuben's diff applied to the current master, but upon running the conversion via the following command:

tensorflowjs_converter --input_format=tf_frozen_model --output_format=tensorflowjs --output_node_names="logits,new_state_c,new_state_h,metadata_version,metadata_sample_rate,metadata_feature_win_len,metadata_feature_win_step,metadata_alphabet" ./exports/output_graph.pb ./exports/output_graph.tfjs

I got the following error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/alex/miniconda3/envs/tfjs-conv/bin/tensorflowjs_converter", line 8, in <module>
    sys.exit(main())
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/converter.py", line 352, in main
    strip_debug_ops=FLAGS.strip_debug_ops)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/tf_saved_model_conversion_pb.py", line 331, in convert_tf_frozen_model
    skip_op_check=skip_op_check, strip_debug_ops=strip_debug_ops)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/tf_saved_model_conversion_pb.py", line 117, in optimize_graph
    ', '.join(unsupported))
ValueError: Unsupported Ops in the model before optimization
AddV2

I added a --skip_op_check=SKIP_OP_CHECK flag to proceed past that ValueError and I was able to get to:

tensorflowjs_converter --input_format=tf_frozen_model --output_format=tensorflowjs --output_node_names="logits,new_state_c,new_state_h,metadata_version,metadata_sample_rate,metadata_feature_win_len,metadata_feature_win_step,metadata_alphabet" ./exports/output_graph.pb ./exports/output_graph.tfjs --skip_op_check=SKIP_OP_CHECK
Using TensorFlow backend.
2020-04-22 18:03:42.796417: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:581] Optimization results for grappler item: graph_to_optimize
2020-04-22 18:03:42.796448: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   debug_stripper: Graph size after: 1026 nodes (0), 1715 edges (0), time = 1.045ms.
2020-04-22 18:03:42.796453: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   model_pruner: Graph size after: 980 nodes (-46), 1671 edges (-44), time = 5.506ms.
2020-04-22 18:03:42.796458: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 975 nodes (-5), 1666 edges (-5), time = 1793.35901ms.
2020-04-22 18:03:42.796463: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   arithmetic_optimizer: Graph size after: 825 nodes (-150), 1489 edges (-177), time = 1143.177ms.
2020-04-22 18:03:42.796468: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   dependency_optimizer: Graph size after: 793 nodes (-32), 1441 edges (-48), time = 8.19ms.
2020-04-22 18:03:42.796473: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 2.886ms.
2020-04-22 18:03:42.796494: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   remapper: Graph size after: 793 nodes (0), 1441 edges (0), time = 2.436ms.
2020-04-22 18:03:42.796498: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 887.803ms.
2020-04-22 18:03:42.796512: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 1011.50897ms.
2020-04-22 18:03:42.796516: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 7.577ms.
2020-04-22 18:03:42.796521: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   debug_stripper: Graph size after: 793 nodes (0), 1441 edges (0), time = 1.091ms.
2020-04-22 18:03:42.796525: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 2.682ms.
2020-04-22 18:03:42.796529: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 889.931ms.
2020-04-22 18:03:42.796534: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 1015.008ms.
2020-04-22 18:03:42.796538: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 7.938ms.
2020-04-22 18:03:42.796542: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   model_pruner: Graph size after: 793 nodes (0), 1441 edges (0), time = 2.935ms.
2020-04-22 18:03:42.796546: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   remapper: Graph size after: 793 nodes (0), 1441 edges (0), time = 2.492ms.
2020-04-22 18:03:42.796561: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 793 nodes (0), 1441 edges (0), time = 904.186ms.
2020-04-22 18:03:42.796565: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   arithmetic_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 1013.05298ms.
2020-04-22 18:03:42.796570: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   dependency_optimizer: Graph size after: 793 nodes (0), 1441 edges (0), time = 7.999ms.
Writing weight file ./exports/output_graph.tfjs/tensorflowjs_model.pb...
2020-04-22 18:03:43.029490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-22 18:03:43.051779: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3497900000 Hz
2020-04-22 18:03:43.052009: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5580c09fc300 executing computations on platform Host. Devices:
2020-04-22 18:03:43.052033: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "/home/alex/miniconda3/envs/tfjs-conv/bin/tensorflowjs_converter", line 8, in <module>
    sys.exit(main())
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/converter.py", line 352, in main
    strip_debug_ops=FLAGS.strip_debug_ops)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/tf_saved_model_conversion_pb.py", line 331, in convert_tf_frozen_model
    skip_op_check=skip_op_check, strip_debug_ops=strip_debug_ops)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/tf_saved_model_conversion_pb.py", line 139, in optimize_graph
    extract_weights(optimized_graph, output_graph, quantization_dtype)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/converters/tf_saved_model_conversion_pb.py", line 183, in extract_weights
    [const_manifest], path, quantization_dtype=quantization_dtype)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/write_weights.py", line 119, in write_weights
    group_bytes, total_bytes, _ = _stack_group_bytes(group)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/write_weights.py", line 196, in _stack_group_bytes
    _assert_valid_weight_entry(entry)
  File "/home/alex/miniconda3/envs/tfjs-conv/lib/python3.6/site-packages/tensorflowjs/write_weights.py", line 305, in _assert_valid_weight_entry
    data.dtype.name + ' not supported.')
ValueError: Error dumping weight metadata_alphabet, dtype object not supported.

Anyone know why my model has AddV2 ops? I tried to use the latest release (v0.7.0a3) but only the current master had the files applicable for the diff. Perhaps I can try to restructure the v0.6.1 release according to reuben's diff and go from there.

reuben commented 4 years ago

@alexcannan I just did pip install tensorflowjs in a separate virtual environment, so the latest version. We're releasing 0.7.0 in the next few days so you should be able to more easily reproduce this.

alexcannan commented 4 years ago

Upgrading to the most recent version seemed to work! I'll put together a client to test performance.

alexcannan commented 4 years ago

So, I naïvely put together a small app to transcribe audio thinking that model.predict(inputBuffer) would give me transcribed audio, but it looks like model.predict() calls the session_->Run() function found in the tfmodelstate.cc file under infer(), since running console.log(model.executor.inputs) outputs:

[{
    "name": "input_node",
    "shape": [ 1, 16, 19, 26 ],
    "dtype": "float32"
  },
  {
    "name": "input_lengths",
    "shape": [ 1 ],
    "dtype": "int32"
  },
  {
    "name": "previous_state_c",
    "shape": [ 1, 2048 ],
    "dtype": "float32"
  },
  {
    "name": "previous_state_h",
    "shape": [ 1, 2048 ],
    "dtype": "float32"
  }]

Getting this to work looks like it'll require porting a lot of the native_client C level API to typescript or maybe a WASM module that links up with the model.predict call. I'll continue fudging around with this but if anyone has any quick and dirty ideas I'd love to hear them.

reuben commented 4 years ago

You're right, you'd need to reimplement the whole client logic. Some starting options to get some starting results faster:

MittalShruti commented 4 years ago

hi @alexcannan how did you load the model.json in the browser. I am new to js. I have kept the model.json file that you get after tensorflowjs_converter in the src folder. I am building a react app

This is the model.json in case it helps

import * as tf from '@tensorflow/tfjs';
import {loadGraphModel} from '@tensorflow/tfjs-converter';

const MODEL_URL = "./model.json";

class App extends React.Component {
  componentDidMount(){
    (async () => {
      const model = await loadGraphModel(MODEL_URL);
      console.log('tns', model)
    })()

Error

tf-core.esm.js:17 Uncaught (in promise) Error: Failed to parse model JSON of response from model.json. Please make sure the server is serving valid JSON for this request.
    at t.<anonymous> (tf-core.esm.js:17)
    at tf-core.esm.js:17
    at Object.throw (tf-core.esm.js:17)
    at s (tf-core.esm.js:17)
alexcannan commented 4 years ago

@MittalShruti I would try and just use the tf.loadGraphModel() method from the main tfjs package instead of whatever that tfjs-converter import is. I'm able to import the model using TFJS 1.7.3.

MittalShruti commented 4 years ago

I was not using a server. http-server resolved the issue

MittalShruti commented 4 years ago

@alexcannan I am a bit confused as to why are you using model.predict to get the transcription. I looked at evaluate tflite.py code, it has ds.stt(audio) to transcribe an audio. Similarly, the client.js file How did you figure out that you need to call model.predict and what should be its parameters?

console.log(model.executor.inputs) outputs 4 parameters that can be found in the tfmodelstate.cc file. But I am unable to figure out a connection between tfmodelstate and model.predict

reuben commented 4 years ago

@MittalShruti TensorFlow.JS is an alternative backend, similar to TFModelState and TFLiteModelState in the native client code. This means one has to reimplement the entire inference logic to be able to use the TF.JS converted model. You're not going to magically get the the DeepSpeech API from just converting the model, it needs to be implemented in JS (or compiled/transpiled, I guess).

MittalShruti commented 4 years ago

hi @alexcannan I was looking at the tfjs model.predict code I couldn't understand where does model.predict get that it has to look into tfModelState file?

deepspeech.cc calls the tfModelState; and BUILD file has deepspeech.cc and tfModelState as well. So, which part of the code above tells the model.predict to point to session_->run in tfModelState.

Could you point me to some resource that will help me understand the inference code flow - sequence of code blocks that are called start to end during inference. Thanks

reuben commented 4 years ago

model.predict is not literally calling that code, it's equivalent to it. Calling model.predict is equivalent to calling Session::Run() with the C++ TensorFlow API.

MittalShruti commented 4 years ago

where did we set this in the code that calling model.predict should be equivalent to calling Session::Run()?

reuben commented 4 years ago

There's nothing set anywhere, they're not connected in any way other than semantically, because they perform equivalent actions (running a TensorFlow graph).

MittalShruti commented 4 years ago

ok client.cc calls DS_createModel, that initialises a model with TFModelState input. So, while running the model in tfjs, the input (say, X) to model.predict should be of the form defined in TFModelState. Is that correct?

If yes, I have anther question: I was passing an audio file through mode.predict which obviously is not of the form X. Why did it mean that I need to convert the TFModelState to tfjs (as you stated earlier in the thread)? Shouldn't it just mean that the audio file that I am passing should be converted to the form X

The train.py (create_inference_graph) has already defined what the X format mean. Why was my input not directly converted into that format? The create_inference_graph is not converted to tfjs through tfjs_coverter, is it? So basically i need to write code in tfjs to convert this input to form X?

And how would i convey in the tfjs code that model.predict would mean Session::Run() equivalent-of-tfjs. PS: I can take this to the discourse forum, if this is not the right place to discuss this here.

EvanBialo commented 4 years ago

I was not using a server. http-server resolved the issue

I'm having the same issue. Could you elaborate on how that fixed it?

fyuvb commented 2 years ago

Hi thank you @reuben for the great comments. It did help me a lot in understanding what I would need to do to make tensorflowjs work. I was trying to write the preprocessing including MFCC buffering, etc and to construct the input from streaming audio wave. However, I checkout the latest code I found that the model converted now takes 5 parameters.

[
    { "name": "input_samples", "shape": [ 512 ], "dtype": "float32" },
    { "name": "input_node",  "shape": [1, 4, 19, 26],"dtype": "float32" },
    { "name": "input_lengths", "shape": [ 1 ], "dtype": "int32" },
    { "name": "previous_state_c", "shape": [ 1, 512 ], "dtype": "float32" },
    { "name": "previous_state_h", "shape": [ 1, 512 ], "dtype": "float32" }
]

Which is different from the above result that only needs 4 parameters. "input_samples" is the extra parameter. I tried to find out what this is and I found some hint here: https://github.com/mozilla/DeepSpeech/blob/aa1d28530d531d0d92289bf5f11a49fe516fdc86/training/deepspeech_training/train.py#L736

Here are my quetsions:

  1. I went through the code but this "input_samples" was not passed in the tensorflow main graph to do any processing for logits / new_state_c / new_state_h . It was merely used to create a seperate mfcc output. In such case, is it safe to just ignore this input? Why do we have this in the first place? I try to find the commit / PR for adding this change but I failed to do that. Also, the dimension of 512, which comes from Config.audio_window_samples, did not match anything ( I was using default settings forfeature_win_len, feature_win_step, sample_rate. Those numbers also cannot multiply into 512 in terms of actual audio samples ). I did not find any clues of feeding it into the actual pipeline. Could you advice if there is anything I missed?
  2. I read the code carefully and I found that in the input_node second dimension is actually the same as input_lengths, which comes from the config.n_step. Why do we have an extra input called "input_lengths" instead of using input_node's second dimension as the input_lengths? I think I might miss some ideas here as well.

Many thanks!

fyuvb commented 2 years ago

Oh, during implementation, I found the answer to my question 1. input_samples does not have impact on logits/new_state_c/new_state_h. MFCC computation is done in the graph network as well, instead of computing using other lib. Dimension of 512 comes from Sample_rate window_length => 16000 32ms = 512 samples. This actually makes implementation much easier.

fyuvb commented 2 years ago

I could finally do inference with a deepspeech trained model with TensorflowJS. My previous two comments were in the wrong direction because tensorflowjs_converter outputs BlockLSTM and Mfcc related ops that was not supported in tensorflowjs. If you want to proceed, please follow @reuben 's previous post https://github.com/mozilla/DeepSpeech/issues/2233#issuecomment-616815988 to change the model to use static_rnn and remove mfcc related parts.

I tried feeding all zeros tensors to the network and it gave "blank" high probabilities for n_steps. This result seems correct to me.

reuben commented 2 years ago

Very nice! I would be more than happy to help you get this merged onto Coqui STT (not sure if you're aware but this repo is no longer maintained). Then we can have the JS client tested in CI and kept working well with changes.

fyuvb commented 2 years ago

Very nice! I would be more than happy to help you get this merged onto Coqui STT (not sure if you're aware but this repo is no longer maintained). Then we can have the JS client tested in CI and kept working well with changes.

I am happy to do so but I am now struggling with the MFCC generation from Web Audio Context. Will try to compute the same set of MFCC between web and tensorflow lib first and see how it goes.

reuben commented 2 years ago

You might be able to extract just the MFCC computation code and then compile it to WebAssembly. Take a look at https://github.com/coqui-ai/inference-engine/blob/main/CMakeLists.txt for example, and https://github.com/coqui-ai/inference-engine/tree/main/third_party/tensorflow

fyuvb commented 2 years ago

I was trying to use Meyda, as suggested before, for MFCC extraction and it did not work out well. I submit an issue to them: https://github.com/meyda/meyda/issues/1099.

Also, I found that during inference, if I directly call model.predict, it would throw an error:

Uncaught Error: This execution contains the node 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cond_3/Merge_2', which has the dynamic op 'Merge'. Please use model.executeAsync() instead. Alternatively, to avoid the dynamic ops, specify the inputs [new_state_h]

I would require me to call model.executeAsync(). I could indeed infer through executeAsync, but it gave me a lot of troubles in maintaining the hidden state of previous_state_c, previous_state_h asynchronously. Are there any ideas about how to get around with it? [new_state_h] seemed to be the output to me. Tensorflowjs indicate that it is an input. This is quite confusing.

Thank you @reuben for your advice on webassembly! Now I think webassembly should be the right solution as MFCC extraction through web seemed too complex to debug. I will look into webassembly next.