sensein / senselab

SenseLab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.
https://github.com/sensein/senselab
Apache License 2.0
6 stars 3 forks source link

Add forced alignment #14

Open ibevers opened 2 months ago

ibevers commented 2 months ago

Description

Create a forced alignment task in audio folder

Tasks

Freeform Notes

No response

satra commented 1 month ago

btw, stan pointed to voicecraft which seems to have a pipeline that can do transcription and alignment - may want to check differences between the current model and that one. also does text to speech, and perhaps we can avoid the older dependencies that TTS brings to the b2aiprep package.

fabiocat93 commented 1 month ago

btw, stan pointed to voicecraft which seems to have a pipeline that can do transcription and alignment - may want to check differences between the current model and that one. also does text to speech, and perhaps we can avoid the older dependencies that TTS brings to the b2aiprep package.

voicecraft is cool, I have attended one talks from them recently. But from what I know, they don't have a package on pypi yet

satra commented 1 month ago

but they have models on huggingface that we could use right? in fact, i played with their spaces, which means all the code for that is also on huggingface. so technically we should be able to create that pipeline.

fabiocat93 commented 1 month ago

but they have models on huggingface that we could use right? in fact, i played with their spaces, which means all the code for that is also on huggingface. so technically we should be able to create that pipeline.

Do you want to include their source code in our repo?

satra commented 1 month ago

for things that do not have releases but have git repos and we plan to use their code directly, we can include the repo as a git submodule in an externals directory under source.

however, if it's a matter of copying a script or a workflow with huggingface models, we should just create the workflow ourselves. it depends on the complexity implemented.