Audio transcribing + diarization pipeline.
faster-whisper==1.0.3
)predict.py
in the setup
function
cog build
cog predict -i input.wav
cog push r8.im/<username>/<name>
file_string: str
: Either provide a Base64 encoded audio file.file_url: str
: Or provide a direct audio file URL.file: Path
: Or provide an audio file.group_segments: bool
: Group segments of the same speaker shorter than 2 seconds apart. Default is True
.num_speakers: int
: Number of speakers. Leave empty to autodetect. Must be between 1 and 50.translate: bool
: Translate the speech into English.language: str
: Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.prompt: str
: Vocabulary: provide names, acronyms, and loanwords in a list. Use punctuation for best accuracy. Also now used as 'hotwords' paramater in transcribing,offset_seconds: int
: Offset in seconds, used for chunked inputs. Default is 0.transcript_output_format: str
: Specify the format of the transcript output: individual words with timestamps, full text of segments, or a combination of both.
both
.words_only
, segments_only
, both
,segments: List[Dict]
: List of segments with speaker, start and end time.
avg_logprob
for each segment and probability
for each word level segment.num_speakers: int
: Number of speakers (detected, unless specified in input).language: str
: Language of the spoken words as a language code like 'en' (detected, unless specified in input).