modelscope / KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
MIT License
496 stars 80 forks source link

Add function of am streaming inference #92

Closed EricFuma closed 11 months ago

EricFuma commented 11 months ago
  1. Adding chunk_forward function for FsmnEncoderV2 and MemoryBlockV2 module, which is based on cache and implement streaming inference chunk by chunk;

  2. Reconstruct the forward function of KanTtsSAMBERT, extract the common part into the pre_forward function, and use it as a common pre-module for the forward and forward_chunk functions to reduce the amount of redundant code; among them, chunk_forward implements The frame-level streaming inference function, which can control the mel length of each inference by changing the mel_chunk_size parameter;

  3. In the infer_sambert.py script, add the --inference_type and --mel_chunk_size parameters. Among them, --inference_type controls am's inference method, --mel_chunk_size specifies the chunk size of streaming inference (need to specify --inference_type == "streaming" at the same time)

  4. This update is an incremental update, and existing training and inference scripts and commands can run normally; the results of streaming inference and non-streaming inference have passed the consistency test, and the code has passed the pre-commit check.

Tips:

  1. The functionality of the streaming vocoder is currently under testing and is expected to be submitted soon.
  2. Meanwhile, I am attempting to construct the am and vocoder into an asynchronous pipeline.