nvidia-riva / riva-asrlib-decoder

Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
80 stars 23 forks source link

WIP Throughput + WER table #19

Open galv opened 1 year ago

galv commented 1 year ago

Just using this github issue as a markdown scratchpad to create a table of results to show later.

Entry in each table cell is: "WER, RTFx"

"Small" models refers to Conformer CTC Small. "Medium" model refers to Conformer CTC Medium. "Large" model refers to Conformer CTC Large.

Used the Arpa LM from https://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz

All models were run in half precision

Topology Model Test-Clean Test-Other Dev-Clean Dev-Other
Vanilla Small 3.56%, 4405 7.12%, 4449 3.44%, 3729 6.99%, 4713
Vanilla Medium 3.20%, 3537 5.73%, 3695 2.80%, 3436 5.46%, 4057
Vanilla Large 2.74%, 2319 4.53%, 2399 2.49%, 2173 4.31%, 2595
Compact Small 3.17%, 4190 6.83%, 4049 2.97%, 4196 6.68%, 3987
Compact Medium 2.65%, 3362 5.32%, 3624 2.27%, 3283 5.03%, 3827
Compact Large 2.21%, 2320 4.10%, 2421 1.87%, 2126 3.90%, 2560

Relevant hyperparameters:

    def create_decoder_config():
        config = BatchedMappedDecoderCudaConfig()
        config.n_input_per_chunk = 50
        config.online_opts.decoder_opts.default_beam = 17.0
        config.online_opts.decoder_opts.lattice_beam = 8.0
        config.online_opts.decoder_opts.max_active = 10_000
        config.online_opts.determinize_lattice = True
        config.online_opts.max_batch_size = 200
        config.online_opts.num_channels = config.online_opts.max_batch_size * 2
        config.online_opts.frame_shift_seconds = 0.04
        config.online_opts.lattice_postprocessor_opts.acoustic_scale = 1.0
        config.online_opts.lattice_postprocessor_opts.lm_scale = 1.0
        config.online_opts.lattice_postprocessor_opts.word_ins_penalty = 0.0
        config.online_opts.lattice_postprocessor_opts.nbest = 1
        config.online_opts.num_decoder_copy_threads = 2
        config.online_opts.num_post_processing_worker_threads = multiprocessing.cpu_count() - config.online_opts.n\
um_decoder_copy_threads

        return config

        config.online_opts.decoder_opts.length_penalty = -0.5
        config.online_opts.lattice_postprocessor_opts.lm_scale = 0.8
        acoustic_scale = 1 / 0.55 # Log likelihoods output by acoustic model get multiplied by this before passing into the decoder

All results obtained with an A100-80GB GPU, on a 16 core CPU server.

galv commented 1 year ago

Version 0.4.0:

Topology Model Test-Clean Test-Other Dev-Clean Dev-Other
Vanilla Small 3.56%, 4464.1221 7.12%, 4527.6245 3.44%, 3998.9165 6.99%, 4528.6743
Vanilla Medium 3.20%, 3721.2207 5.73%, 4082.0947 2.80%, 3726.6550 5.46%, 4099.2676
Vanilla Large 2.74%, 2321.9045 4.53%, 2189.0354 2.49%, 2132.1375 4.31%, 2574.2717
Compact Small 3.17%, 4320.1294 6.83%, 4110.2827 2.97%, 3721.0525 6.68%, 4458.4355
Compact Medium 2.65%, 3726.7332 5.32%, 3909.8123 2.27%, 3621.8477 5.03%, 3800.0156
Compact Large 2.21%, 2261.0801 4.10%, 2205.2964 1.87%, 2196.7666 3.90%, 2530.3594