mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.23k stars 3.95k forks source link

Arbitrary number of LSTMs #1933

Closed bernardohenz closed 4 years ago

bernardohenz commented 5 years ago

I've read some recent papers for speech recognition [1,2], and I noticed that they all tend to use more than a single LSTM layer. [1] Park et al., Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices. NeurIPS 2018. [2] He et al., Streaming End-to-end Speech Recognition For Mobile Devices. arXiv:1811.0662, 2018.

I tried to implement it by myself, and it seems to be working for training/evaluation/exporting. Unfortunately, I need some help about what changes should I make on the binaries.

The following patch holds the implementation of arbitrary number of LSTMs (haven't created a PR because the binaries were not touched). patch_more_LSTMs.patch.txt

kdavis-mozilla commented 5 years ago

We have made a conscious choice to use a single LSTM layer; we want the model to be as lightweight as possible to allow targeting as many devices as possible, despite some of the devices being resource constrained.

However, that said if you want to discuss your multi-layer LSTM experiments we encourage and invite your input on our discourse form as we'd be very much interested in your results.

cahuja1992 commented 5 years ago

I've read some recent papers for speech recognition [1,2], and I noticed that they all tend to use more than a single LSTM layer. [1] Park et al., Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices. NeurIPS 2018. [2] He et al., Streaming End-to-end Speech Recognition For Mobile Devices. arXiv:1811.0662, 2018.

I tried to implement it by myself, and it seems to be working for training/evaluation/exporting. Unfortunately, I need some help about what changes should I make on the binaries.

The following patch holds the implementation of arbitrary number of LSTMs (haven't created a PR because the binaries were not touched). patch_more_LSTMs.patch.txt

I am trying to apply this patch. Can you please tell for which tag this tag is for ?

bernardohenz commented 5 years ago

@cahuja1992 I strongly recommend you to take a look into the cudnnrnn branch. It uses the cudnnLSTM/GRU that, besides being faster, allows you to easily set the number of RNN layers (see the doc).

My patch is way old, and I couldn't get any improvement with more layers (maybe I was implementing it wrongly)

reuben commented 5 years ago

Now that we've merged TensorFlow 1.14 support I plan to merge the CuDNN RNN support into master.

kdavis-mozilla commented 4 years ago

CuDNN RNN support is in master.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.