modelscope / KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
MIT License
496 stars 80 forks source link

Implemented the streaming inference function of AM #83

Closed EricFuma closed 1 year ago

EricFuma commented 1 year ago
  1. Adding chunk_forward function for FsmnEncoderV2 and MemoryBlockV2 module, which is based on cache and implement streaming inference chunk by chunk;

  2. Reconstruct the forward function of KanTtsSAMBERT, extract the common part into the pre_forward function, and use it as a common pre-module for the forward and forward_chunk functions to reduce the amount of redundant code; among them, chunk_forward implements The frame-level streaming inference function, which can control the mel length of each inference by changing the mel_chunk_size parameter;

  3. In the infer_sambert.py script, add the --inference_type and --mel_chunk_size parameters. Among them, --inference_type controls am's inference method, --mel_chunk_size specifies the chunk size of streaming inference (need to specify --inference_type == "streaming" at the same time)

  4. This update is an incremental update, and existing training and inference scripts and commands can run normally; the results of streaming inference and non-streaming inference have passed the consistency test, and the code has passed the pre-commit check.

CLAassistant commented 1 year ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


伏鹳 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.