zsanjin-p commented 5 months ago

用的是api生成的语音片段。并不是每个生成的语音片段都有这样的啪嗒的声音，但是有不少语音片段头部，有啪嗒的一声，或者哒的一声，就像电流啪嗒一样的声音，这是什么原因？你们有这样吗？

syq163 commented 5 months ago

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

lh7343 commented 3 months ago

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下 response.zip

lh7343 commented 3 months ago

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下 response.zip

syq163 commented 3 months ago

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

lh7343 commented 3 months ago

我用的是api的方式。以下是我的docker run命令 docker run --gpus "device=3" -d --name EmotiVoice -p 28021:8000 -v /raid/liuhao/EmotiVoice:/workspace/EmotiVoice -w /workspace/EmotiVoice/EmotiVoice emoti-voice:v1 env LANG=C.UTF-8 sh -c "uvicorn openaiapi:app --reload --host 0.0.0.0 --port 8000 >> log/all.log 2>&1"

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

zsanjin-p commented 2 months ago

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下 response.zip
import os
from pydub import AudioSegment
import logging

Set up logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def remove_or_silence_noise_from_audio_files(directory, noise_duration_ms, mode):

Determine the output folder for processed audio files

output_folder = os.path.join(directory, "Processed_Audio")
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
    logging.info(f"Folder created: {output_folder}")

# Get all audio files
audio_files = [file for file in os.listdir(directory) if file.endswith(('.mp3', '.wav'))]
logging.info(f"Found {len(audio_files)} audio files.")

# Initialize statistics variables
success_count = 0
fail_count = 0
failed_files = []

# Process each file
for file in audio_files:
    file_path = os.path.join(directory, file)
    try:
        # Load the audio
        audio = AudioSegment.from_file(file_path)
        logging.info(f"Processing audio file: {file_path}")

        if mode == 1:
            # Remove noise from the beginning of the audio for noise_duration_ms milliseconds
            processed_audio = audio[noise_duration_ms:]
        elif mode == 2:
            # Create a silence segment and replace the beginning noise_duration_ms milliseconds with it
            silence = AudioSegment.silent(duration=noise_duration_ms)
            processed_audio = silence + audio[noise_duration_ms:]

        # Save the new audio file
        new_file_path = os.path.join(output_folder, file)
        processed_audio.export(new_file_path, format=file[-3:])
        logging.info(f"Processed audio file saved to: {new_file_path}")
        success_count += 1
    except Exception as e:
        logging.error(f"Error processing audio file {file_path}: {e}")
        fail_count += 1
        failed_files.append((file_path, str(e)))

# Log the results
logging.info(f"Processing complete. Success: {success_count}, Failures: {fail_count}")
if fail_count > 0:
    logging.info("Failed files and reasons:")
    for file, error in failed_files:
        logging.info(f"File: {file}, Error: {error}")

if name == "main":

User inputs the processing time, default is 100ms

try:
    noise_duration_ms = int(input("Enter the noise processing time (ms, default 100ms): ") or "100")
except ValueError:
    print("Invalid input, using default value of 100ms")
    noise_duration_ms = 100

# User chooses the processing mode
try:
    mode = int(input("Choose the mode (1: Remove beginning noise, 2: Replace beginning noise with silence): "))
    if mode not in [1, 2]:
        raise ValueError("Invalid mode, must be 1 or 2")
except ValueError as ve:
    print(ve)
    mode = int(input("Please re-enter the correct mode (1 or 2): "))

# Call the function to process audio files in the current directory
remove_or_silence_noise_from_audio_files(os.getcwd(), noise_duration_ms, mode)

netease-youdao / EmotiVoice

生成的语音开头有啪嗒的声音 #119

Set up logging

Determine the output folder for processed audio files

User inputs the processing time, default is 100ms