Closed acul3 closed 3 weeks ago
Hi. We tried various method to speed up our pipeline, some effective ways are listed here for checking and reference:
Using CUDA: ['CUDAExecutionProvider', ...]
, confirming that ONNX is utilizing CUDA. If you're experiencing issues with ONNX configuration, please check: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements--batch_size
and --threads
when running Pipeline, this configuration can speed up ASR processing part.main.py
on a single GPU, if you have enough GPU memory.Hi. We tried various method to speed up our pipeline, some effective ways are listed here for checking and reference:
- Make sure ONNX and Pytorch are running on CUDA. When you execute the pipeline, there should be no yellow warning logs, and you should see the message
Using CUDA: ['CUDAExecutionProvider', ...]
, confirming that ONNX is utilizing CUDA. If you're experiencing issues with ONNX configuration, please check: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements- Configure
--batch_size
and--threads
when running Pipeline, this configuration can speed up ASR processing part.- Multiprocessing: Running multiple
main.py
on a single GPU, if you have enough GPU memory.- Asynchronous processing: For preprocessing and result-writing part, since they are CPU-bound and IO-bound task, you can make them asynchronous to speed up the pipeline.
In our experiments, Emilia-Pipe can process 2.5 hours of raw speech data in one minute using a dedicated server equipped with eight NVIDIA RTX 4090 GPUs. We’ve implemented various technical optimizations, such as multiprocessing and batching, to achieve this speed, which should have met industry standards. It’s important to note that processing speed can be influenced by the hardware environment. If the speed you mentioned—“1 hour of audio in about 1.5 minutes”—was achieved with a single GPU, it seems like an acceptable time. In addition to @yuantuo666 ’s suggestions, you might consider using more GPUs or trying a smaller ASR model like Whisper-tiny (though this may involve trade-offs).
thank you for you answer
@yuantuo666 @HarryHe11
i manage to speed up using more gpu,
my code is messy
maybe if you guy have time, add function that, and specify or arguments (--num_gpus)
Hi, i've met the same speed limit. I follow the instructions in README.md and set export CUDA_VISIBLE_DEVICES=0,1,2,3 , which is expected to run on multi-gpus. However, it only uses gpu 0, do you have any ideas about that? Thanks in advance!
Hi, i've met the same speed limit. I follow the instructions in README.md and set export CUDA_VISIBLE_DEVICES=0,1,2,3 , which is expected to run on multi-gpus. However, it only uses gpu 0, do you have any ideas about that? Thanks in advance!
Hi, we didn't implement multi-gpus running feature in main.py
file, instead, you need to runing it manually or writing a simple script to schedule multi-gpus tasks.
E.g.
tmux new -s gpu0
conda activate AudioPipeline
CUDA_VISIBLE_DEVICES=0 python main.py --input_folder_path=/path/to/task_folder_0
tmux new -s gpu1
conda activate AudioPipeline
CUDA_VISIBLE_DEVICES=1 python main.py --input_folder_path=/path/to/task_folder_1
Got it! Thank you very much.
thanks for creating emilia pipeline
i am using it now for my language, and so far so good
is there a way to speeding it up? right now,1 hour audio is about ~1.5 minutes to process, if we have 27k hours it takes long time
maybe using multiprocessing or batching ?(i know it might be difficult due onnx and cuda related)