Run k2.connect on CUDA.

https://github.com/k2-fsa/k2/pull/771 added support for CUDA to k2.connect.

This pull-request updates the code in snowfall to use that feature, which makes the rescoring process much faster.

The following are some decoding logs from one of my models for test-other. You can see that when k2.connect runs on CUDA, the decoding time is reduced from 27 minutes to 1.5 minutes. (n-best list rescoring with max_duration=200)

k2.connect() on CPU

2021-07-10 20:06:38,929 INFO [mmi_att_transformer_decode_2nd.py:540] * DECODING: test-other
2021-07-10 20:06:59,215 INFO [mmi_att_transformer_decode_2nd.py:229] batch 0, cuts processed until now is 0/2939 (0.000000%)
2021-07-10 20:08:36,257 INFO [mmi_att_transformer_decode_2nd.py:229] batch 10, cuts processed until now is 299/2939 (10.173528%)
2021-07-10 20:12:01,184 INFO [mmi_att_transformer_decode_2nd.py:229] batch 20, cuts processed until now is 583/2939 (19.836679%)
2021-07-10 20:14:05,966 INFO [mmi_att_transformer_decode_2nd.py:229] batch 30, cuts processed until now is 887/2939 (30.180333%)
2021-07-10 20:16:06,025 INFO [mmi_att_transformer_decode_2nd.py:229] batch 40, cuts processed until now is 1178/2939 (40.081660%)
2021-07-10 20:19:32,606 INFO [mmi_att_transformer_decode_2nd.py:229] batch 50, cuts processed until now is 1474/2939 (50.153113%)
2021-07-10 20:22:17,564 INFO [mmi_att_transformer_decode_2nd.py:229] batch 60, cuts processed until now is 1768/2939 (60.156516%)
2021-07-10 20:25:38,010 INFO [mmi_att_transformer_decode_2nd.py:229] batch 70, cuts processed until now is 2067/2939 (70.330044%)
2021-07-10 20:29:43,738 INFO [mmi_att_transformer_decode_2nd.py:229] batch 80, cuts processed until now is 2380/2939 (80.979925%)
2021-07-10 20:33:35,987 INFO [mmi_att_transformer_decode_2nd.py:229] batch 90, cuts processed until now is 2685/2939 (91.357605%)

k2.connect() on CUDA

2021-07-10 20:42:44,318 INFO [mmi_att_transformer_decode_2nd.py:541] * DECODING: test-other
2021-07-10 20:42:46,479 INFO [mmi_att_transformer_decode_2nd.py:229] batch 0, cuts processed until now is 0/2939 (0.000000%)
2021-07-10 20:42:56,037 INFO [mmi_att_transformer_decode_2nd.py:229] batch 10, cuts processed until now is 299/2939 (10.173528%)
2021-07-10 20:43:08,377 INFO [mmi_att_transformer_decode_2nd.py:229] batch 20, cuts processed until now is 583/2939 (19.836679%)
2021-07-10 20:43:19,244 INFO [mmi_att_transformer_decode_2nd.py:229] batch 30, cuts processed until now is 887/2939 (30.180333%)
2021-07-10 20:43:30,034 INFO [mmi_att_transformer_decode_2nd.py:229] batch 40, cuts processed until now is 1178/2939 (40.081660%)
2021-07-10 20:43:41,222 INFO [mmi_att_transformer_decode_2nd.py:229] batch 50, cuts processed until now is 1474/2939 (50.153113%)
2021-07-10 20:43:52,937 INFO [mmi_att_transformer_decode_2nd.py:229] batch 60, cuts processed until now is 1768/2939 (60.156516%)
2021-07-10 20:44:04,186 INFO [mmi_att_transformer_decode_2nd.py:229] batch 70, cuts processed until now is 2067/2939 (70.330044%)
2021-07-10 20:44:16,865 INFO [mmi_att_transformer_decode_2nd.py:229] batch 80, cuts processed until now is 2380/2939 (80.979925%)
2021-07-10 20:44:29,053 INFO [mmi_att_transformer_decode_2nd.py:229] batch 90, cuts processed until now is 2685/2939 (91.357605%)

k2-fsa / snowfall

Run k2.connect on CUDA. #231

k2.connect() on CPU

k2.connect() on CUDA