Open ukalla1 opened 1 month ago
Can we use better or even paid STT models to detect speech better and maybe even separate out speakers. Ideally we want to solve the "cocktail party" problem.
Can we use better or even paid STT models to detect speech better and maybe even separate out speakers. Ideally we want to solve the "cocktail party" problem.