Optimize `audio_data_to_segment` Function to Reduce Processing Time by ~2000ms

IrisSally commented 3 months ago

阅读 README.md 和 dependencies.md

[X] 我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

[X] 我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

[X] 我已经确认问题发生在最新代码或稳定版本中

你的issues

Issue Description:

Summary:

The current implementation of the audio_data_to_segment function in code/ChatTTS-Forge/modules/SynthesizeSegments.py is inefficient and results in significant processing time. By optimizing the function, we can reduce the processing time by approximately 2000 milliseconds.

Current Implementation:

The current function converts audio data to a byte stream and then reads it back into an AudioSegment object, which is time-consuming.

def audio_data_to_segment(audio_data, sr):
    byte_io = io.BytesIO()
    write(byte_io, rate=sr, data=audio_data)
    byte_io.seek(0)

    return AudioSegment.from_file(byte_io, format="wav")

Proposed Optimization:

The optimized function ensures the audio data is in the correct format and directly creates an AudioSegment object from the byte data, significantly reducing the processing time.

import numpy as np
def audio_data_to_segment(audio_data, sr):
    # Ensure the audio data is in the correct format
    audio_data = (audio_data * 32767).astype(np.int16)  # Convert float32 to int16
    audio_segment = AudioSegment(
        audio_data.tobytes(), 
        frame_rate=sr, 
        sample_width=audio_data.dtype.itemsize, 
        channels=1  # Assuming mono audio
    )
    return audio_segment

Performance Improvement:

Testing has shown that the optimized function can reduce the processing time by nearly 2000 milliseconds, making the system more efficient and responsive.

Action Required:

Please review the proposed changes and consider integrating the optimized function into the project to improve performance.

Thank you for your attention to this matter.

zhzLuke96 commented 3 months ago

Thank you for your optimization proposal. I have created a Colab script to reproduce this issue.

Based on my tests, it appears that the optimization reduces the execution time by 0.2 seconds rather than 2 seconds.

Nevertheless, this optimization does improve performance. I will merge your code after thorough testing. Thank you for your issue report!

zhzLuke96 commented 3 months ago

merged d33809c60a3ac76a01f71de4fd26b315d066c8d3

lenML / Speech-AI-Forge