nttcslab-sp / mamba-diarization

Official repository for Mamba-based Segmentation Model for Speaker Diarization
Other
25 stars 3 forks source link

High CPU Usage During Inference Reduces GPU Utilization #9

Closed sipercai closed 2 days ago

sipercai commented 4 days ago

Description:

When running the speaker diarization pipeline, I notice that CPU usage is extremely high, which seems to prevent the GPU utilization from increasing significantly. This results in a bottleneck where the pipeline does not fully leverage the GPU for computation.

I suspect the issue might stem from the use of AgglomerativeClustering in the pipeline, as it operates on the CPU. It appears that the clustering process is consuming significant CPU resources, overshadowing the benefits of using the GPU for segmentation and embedding extraction.


Code Example:

Here is the relevant code snippet where I set up the pipeline:

pipeline = SpeakerDiarizationPipeline( 
    segmentation=model,
    embedding="speechbrain/spkrec-ecapa-voxceleb",
    embedding_exclude_overlap=True,
    clustering="AgglomerativeClustering",
)
pipeline.instantiate(pipeline_params)
pipeline.to(torch.device("cuda:1"))

waveform, original_sample_rate = torchaudio.load(file["audio"])
hypothesis = pipeline({"waveform": waveform, "sample_rate": target_sample_rate}) 

Observations:

  1. High CPU Consumption: During inference, the CPU usage spikes significantly, while the GPU remains underutilized.
  2. Potential Cause: AgglomerativeClustering might be using the CPU for its calculations, which could explain the bottleneck.
  3. Impact: This limits the overall performance and throughput of the pipeline, especially for large datasets use cases.

Suggestions/Questions:

  1. Is AgglomerativeClustering CPU-bound? If so, are there any GPU-accelerated alternatives for clustering in this context?

Any guidance or recommendations on addressing this issue would be greatly appreciated!

FrenchKrab commented 3 days ago

Low CPU usage / low inference speed seems to be a fairly common issue on the pyannote repository, see https://github.com/pyannote/pyannote-audio/issues/1652, https://github.com/pyannote/pyannote-audio/issues/1566, https://github.com/pyannote/pyannote-audio/issues/1702 (and probably others). I have not made significant efforts to optimize the pipeline but the main fix I think you should try is :

(Pass a dict to the pipeline instead of a ProtocolFile)

sipercai commented 2 days ago

Thank you very much for finding these materials for me. They are very helpful to me. I will carefully refer to the content in these links and hope to find solutions to the problem from them. If there are any new discoveries or questions during the research process, I will communicate with you in a timely manner. Thank you again for your help!