Closed 900miles closed 5 months ago
hey @900miles this looks great so far. Could you try to play around with having the input to the functions be the Audio class. this is a nice way to zip the signal and sampling rate throughout the functions
see here for when it is output, and two lines down from there where it is an input https://github.com/sensein/b2aiprep/blob/b5b342fcc5e94e16318a195241388b2000752426/src/b2aiprep/process.py#L51
hey @900miles do you mind adding the packages you use in your process.py
file to the dependencies of the package?
New commit should allow working directly with Audio objects. I've also added a requirements.txt but I've never really made one before so I'm not sure if I did it correctly.
instead of a requirements.txt just add it to the pyproject.toml
also perhaps change the filename to speech2text
.
Done and done!
Adds two functions for using Whisper or WhisperX to transcribe an audio file, and can perform speaker diarization and forced alignment of text output if using WhisperX.