WIP Throughput + WER table

Just using this github issue as a markdown scratchpad to create a table of results to show later.

Entry in each table cell is: "WER, RTFx"

"Small" models refers to Conformer CTC Small. "Medium" model refers to Conformer CTC Medium. "Large" model refers to Conformer CTC Large.

Used the Arpa LM from https://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz

All models were run in half precision

Topology	Model	Test-Clean	Test-Other	Dev-Clean	Dev-Other
Vanilla	Small	3.56%, 4405	7.12%, 4449	3.44%, 3729	6.99%, 4713
Vanilla	Medium	3.20%, 3537	5.73%, 3695	2.80%, 3436	5.46%, 4057
Vanilla	Large	2.74%, 2319	4.53%, 2399	2.49%, 2173	4.31%, 2595
Compact	Small	3.17%, 4190	6.83%, 4049	2.97%, 4196	6.68%, 3987
Compact	Medium	2.65%, 3362	5.32%, 3624	2.27%, 3283	5.03%, 3827
Compact	Large	2.21%, 2320	4.10%, 2421	1.87%, 2126	3.90%, 2560

Relevant hyperparameters:

    def create_decoder_config():
        config = BatchedMappedDecoderCudaConfig()
        config.n_input_per_chunk = 50
        config.online_opts.decoder_opts.default_beam = 17.0
        config.online_opts.decoder_opts.lattice_beam = 8.0
        config.online_opts.decoder_opts.max_active = 10_000
        config.online_opts.determinize_lattice = True
        config.online_opts.max_batch_size = 200
        config.online_opts.num_channels = config.online_opts.max_batch_size * 2
        config.online_opts.frame_shift_seconds = 0.04
        config.online_opts.lattice_postprocessor_opts.acoustic_scale = 1.0
        config.online_opts.lattice_postprocessor_opts.lm_scale = 1.0
        config.online_opts.lattice_postprocessor_opts.word_ins_penalty = 0.0
        config.online_opts.lattice_postprocessor_opts.nbest = 1
        config.online_opts.num_decoder_copy_threads = 2
        config.online_opts.num_post_processing_worker_threads = multiprocessing.cpu_count() - config.online_opts.n\
um_decoder_copy_threads

        return config

        config.online_opts.decoder_opts.length_penalty = -0.5
        config.online_opts.lattice_postprocessor_opts.lm_scale = 0.8
        acoustic_scale = 1 / 0.55 # Log likelihoods output by acoustic model get multiplied by this before passing into the decoder

All results obtained with an A100-80GB GPU, on a 16 core CPU server.

Topology	Model	Test-Clean	Test-Other	Dev-Clean	Dev-Other
Vanilla	Small	3.56%, 4464.1221	7.12%, 4527.6245	3.44%, 3998.9165	6.99%, 4528.6743
Vanilla	Medium	3.20%, 3721.2207	5.73%, 4082.0947	2.80%, 3726.6550	5.46%, 4099.2676
Vanilla	Large	2.74%, 2321.9045	4.53%, 2189.0354	2.49%, 2132.1375	4.31%, 2574.2717
Compact	Small	3.17%, 4320.1294	6.83%, 4110.2827	2.97%, 3721.0525	6.68%, 4458.4355
Compact	Medium	2.65%, 3726.7332	5.32%, 3909.8123	2.27%, 3621.8477	5.03%, 3800.0156
Compact	Large	2.21%, 2261.0801	4.10%, 2205.2964	1.87%, 2196.7666	3.90%, 2530.3594

nvidia-riva / riva-asrlib-decoder

WIP Throughput + WER table #19