juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
903 stars 76 forks source link

Add a caching mechanism for benchmark and tuning #196

Open juanmc2005 opened 8 months ago

juanmc2005 commented 8 months ago

Problem

It's getting more and more difficult to tune and evaluate diarization pipelines with different models or combinations of models, even with a GPU.

Idea

Implement a caching mechanism to save segmentation and embedding outputs to disk. For example, we could use ~/.diart/cache by default, and even allow users to change it with --cache. This could be implemented as an additional parameter of SpeakerDiarization:

pipeline = SpeakerDiarization(config, cache="default")

Where cache: str | Path | None. Using cache=None would prevent caching, cache="default" would use ~/.diart/cache and cache=Path(/some/dir) or cache="/some/dir" would dump/load the cache to/from that directory.

The caching logic could even be implemented as a wrapper of SegmentationModel and EmbeddingModel.