k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.11k stars 213 forks source link

CPU decoding #929

Open jiangj-dc opened 2 years ago

jiangj-dc commented 2 years ago

Does the CPU decoding use Intel MKL? Is there an option to do parallel decoding?

csukuangfj commented 2 years ago

Does the CPU decoding use Intel MKL?

For the neural network part, it is using PyTorch to do computation. And it depends on whether PyTorch is using MKL or not. I think there are options in PyTorch to support parallel computation on CPU.

For the FSA decoding part, there are no linear algebra operations and MKL does not play a role here. Also, it processes each utterance in a batch sequentially on CPU.

jiangj-dc commented 2 years ago

image

Here is a speed comparison using CPU. Any suggestion to improve decoding speed? Thanks.

danpovey commented 2 years ago

The time taken in k2 would be most strongly affected by the beam and max_active-states (most likely), those would be the first things to tune.

jiangj-dc commented 2 years ago

The speed is now comparable if search_beam = 10, max_active_states = 1000, without WER degradation. Alternatively, if BPE vocal_size decreases from 1000 to 500, the speed is also good. Thanks!

danpovey commented 2 years ago

Great! But the speed is now comparable to what?

jiangj-dc commented 2 years ago

The speed was compared to the example Librispeech (960 hours) experiment, where the k2 RTF was about 0.07 (the first row in the above table). For a different dataset, the k2 RTF was about 0.38 (the second row in the above table) and is now 0.07.