ryanleary / mlperf-rnnt-ref

Other
3 stars 1 forks source link

DISCLAIMER

This codebase is a work in progress. There are known and unknown bugs in the implementation, and has not been optimized in any way.

MLPerf has neither finalized on a decision to add a speech recognition benchmark, nor as this implementationn/architecture as a reference implementation.

1. Problem

Speech recognition accepts raw audio samples and produces a corresponding text transcription.

2. Directions

See https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechRecognition/Jasper/README.md. This implementation shares significant code with that repository.

3. Dataset/Environment

Publication/Attribution

"OpenSLR LibriSpeech Corpus" provides over 1000 hours of speech data in the form of raw audio.

Data preprocessing

What preprocessing is done to the the dataset?

Training and test data separation

How is the test set extracted?

Training data order

In what order is the training data traversed?

Test data order

In what order is the test data traversed?

Simulation environment (RL models only)

Describe simulation environment briefly, if applicable.

4. Model

Publication/Attribution

Cite paper describing model plus any additional attribution requested by code authors

List of layers

Brief summary of structure of model

Weight and bias initialization

How are weights and biases initialized

Loss function

Transducer Loss

Optimizer

TBD, currently Adam

5. Quality

Quality metric

Word Error Rate (WER) across all words in the output text of all samples in the validation set.

Quality target

What is the numeric quality target

Evaluation frequency

TBD

Evaluation thoroughness

TBD