Mode 'parallel' for EncSALayer to speed up infer on ONNX

'transformer-parallel' is widely used in GPT-J-6B and has been proven to have the same effect as traditional transformer. It can be simplified as:

This saves a skip link and a LayerNorm.This can bring a slight improvement in training speed on Diffsinger.

After experimentation, this modification has shown a more significant improvement on ONNX. The following are the experimental parameters and results. The benchmark was performed using infer_acoustic.py, and the backbone of the model used lynxnet, without using shallow diffusion.

run_parallel
...20/20 [00:20<00:00,  1.00s/it]
run_series
...20/20 [00:23<00:00,  1.15s/it]

run_parallel
...20/20 [00:20<00:00,  1.01s/it]
run_series
...20/20 [00:22<00:00,  1.13s/it]

run_parallel
...20/20 [00:21<00:00,  1.07s/it]
run_series
...20/20 [00:23<00:00,  1.17s/it]

run_parallel
...20/20 [00:20<00:00,  1.03s/it]
run_series
...20/20 [00:22<00:00,  1.13s/it]

run_parallel
...20/20 [00:20<00:00,  1.04s/it]
run_series
...20/20 [00:23<00:00,  1.16s/it]

On average, the inference speed has increased by 8%. This change has been applied to yousaV1.42ReFlow and there have been no reports of any issues yet.

openvpi / DiffSinger

Mode 'parallel' for EncSALayer to speed up infer on ONNX #191