sensein / b2aiprep

Apache License 2.0
5 stars 6 forks source link

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

Closed 900miles closed 5 months ago

900miles commented 5 months ago

Adds two functions for using Whisper or WhisperX to transcribe an audio file, and can perform speaker diarization and forced alignment of text output if using WhisperX.

Rahul-Brito commented 5 months ago

hey @900miles this looks great so far. Could you try to play around with having the input to the functions be the Audio class. this is a nice way to zip the signal and sampling rate throughout the functions

see here for when it is output, and two lines down from there where it is an input https://github.com/sensein/b2aiprep/blob/b5b342fcc5e94e16318a195241388b2000752426/src/b2aiprep/process.py#L51

fabiocat93 commented 5 months ago

hey @900miles do you mind adding the packages you use in your process.py file to the dependencies of the package?

900miles commented 5 months ago

New commit should allow working directly with Audio objects. I've also added a requirements.txt but I've never really made one before so I'm not sure if I did it correctly.

satra commented 5 months ago

instead of a requirements.txt just add it to the pyproject.toml

satra commented 5 months ago

also perhaps change the filename to speech2text.

900miles commented 5 months ago

Done and done!