open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

[Help]: Speed Up Emilia Pipeline #255

Closed acul3 closed 3 weeks ago

acul3 commented 1 month ago

thanks for creating emilia pipeline

i am using it now for my language, and so far so good

is there a way to speeding it up? right now,1 hour audio is about ~1.5 minutes to process, if we have 27k hours it takes long time

maybe using multiprocessing or batching ?(i know it might be difficult due onnx and cuda related)

yuantuo666 commented 1 month ago

Hi. We tried various method to speed up our pipeline, some effective ways are listed here for checking and reference:

  1. Make sure ONNX and Pytorch are running on CUDA. When you execute the pipeline, there should be no yellow warning logs, and you should see the message Using CUDA: ['CUDAExecutionProvider', ...], confirming that ONNX is utilizing CUDA. If you're experiencing issues with ONNX configuration, please check: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
  2. Configure --batch_size and --threads when running Pipeline, this configuration can speed up ASR processing part.
  3. Multiprocessing: Running multiple main.py on a single GPU, if you have enough GPU memory.
  4. Asynchronous processing: For preprocessing and result-writing part, since they are CPU-bound and IO-bound task, you can make them asynchronous to speed up the pipeline.
HarryHe11 commented 1 month ago

Hi. We tried various method to speed up our pipeline, some effective ways are listed here for checking and reference:

  1. Make sure ONNX and Pytorch are running on CUDA. When you execute the pipeline, there should be no yellow warning logs, and you should see the message Using CUDA: ['CUDAExecutionProvider', ...], confirming that ONNX is utilizing CUDA. If you're experiencing issues with ONNX configuration, please check: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
  2. Configure --batch_size and --threads when running Pipeline, this configuration can speed up ASR processing part.
  3. Multiprocessing: Running multiple main.py on a single GPU, if you have enough GPU memory.
  4. Asynchronous processing: For preprocessing and result-writing part, since they are CPU-bound and IO-bound task, you can make them asynchronous to speed up the pipeline.

In our experiments, Emilia-Pipe can process 2.5 hours of raw speech data in one minute using a dedicated server equipped with eight NVIDIA RTX 4090 GPUs. We’ve implemented various technical optimizations, such as multiprocessing and batching, to achieve this speed, which should have met industry standards. It’s important to note that processing speed can be influenced by the hardware environment. If the speed you mentioned—“1 hour of audio in about 1.5 minutes”—was achieved with a single GPU, it seems like an acceptable time. In addition to @yuantuo666 ’s suggestions, you might consider using more GPUs or trying a smaller ASR model like Whisper-tiny (though this may involve trade-offs).

acul3 commented 1 month ago

thank you for you answer

@yuantuo666 @HarryHe11

i manage to speed up using more gpu,

  1. first split the list data(subset) into number of available gpu,
  2. load models to all available gpu
  3. process subset data ,separately for each gpu

my code is messy

maybe if you guy have time, add function that, and specify or arguments (--num_gpus)

nichousha6 commented 4 weeks ago

Hi, i've met the same speed limit. I follow the instructions in README.md and set export CUDA_VISIBLE_DEVICES=0,1,2,3 , which is expected to run on multi-gpus. However, it only uses gpu 0, do you have any ideas about that? Thanks in advance!

yuantuo666 commented 4 weeks ago

Hi, i've met the same speed limit. I follow the instructions in README.md and set export CUDA_VISIBLE_DEVICES=0,1,2,3 , which is expected to run on multi-gpus. However, it only uses gpu 0, do you have any ideas about that? Thanks in advance!

Hi, we didn't implement multi-gpus running feature in main.py file, instead, you need to runing it manually or writing a simple script to schedule multi-gpus tasks. E.g.

tmux new -s gpu0
conda activate AudioPipeline
CUDA_VISIBLE_DEVICES=0 python main.py --input_folder_path=/path/to/task_folder_0

tmux new -s gpu1
conda activate AudioPipeline
CUDA_VISIBLE_DEVICES=1 python main.py --input_folder_path=/path/to/task_folder_1
nichousha6 commented 3 weeks ago

Got it! Thank you very much.