mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.36k stars 3.97k forks source link

Intuitive explaination of stride #514

Closed ghost closed 7 years ago

ghost commented 7 years ago

Can someone give me an intuitive explanation of stride concept? If stride =2, will rnn skip through one step at a time for the features generated from mfcc while unrolling?

reuben commented 7 years ago

Yep. Every other time step is ignoring when doing the convolutions. You can read more about convolutional layers here: http://cs231n.github.io/convolutional-networks/#conv

ghost commented 7 years ago

Ok that part of the code is implemented in audio.py . What is tough to digest is the analogy to convolution here since in convolution,stride size is how much we shift the filter in each step across space.

reuben commented 7 years ago

Sorry, I'm currently working on the Deep Speech 2 model and just assumed that's what you were talking about. On current master we implement something like a convolution stride by simply dropping every other time step from the input, which is the code you saw in audio.py. In DS2 we have actual convolutional layers.

ghost commented 7 years ago

Is there an active github link to DS2 or DS1 code is being slowly transformed to DS2 ?

reuben commented 7 years ago

The WIP branch is here, but I make no stability or functionality guarantees: https://github.com/mozilla/DeepSpeech/tree/ds2-v2

ghost commented 7 years ago

Great. Would love to see fast progress out there. Meanwhile, I will implement the convolution part and report you the stats :)

reuben commented 7 years ago

I'm gonna close this issue as the question seems to have been answered.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.