jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

Repeating segments being removed. #238

Closed radzionc closed 9 months ago

radzionc commented 1 year ago

I'd like the output to retain all segments from the input, but it seems some recurring portions are being omitted.

Here are the segments from the output.

2:  Upon acceptance, it progress.
3:  Stop.
4:  To reveal age-cowd model. Instead of leaning on the traditional switch case architecture,

But in the input, the fourth segment starts with the same text as the second segment.

4:  Upon acceptance, it progress to reveal age-cowd model. Instead of leaning on the traditional switch case architecture,
jianfch commented 1 year ago

What were the arguments you used?

radzionc commented 1 year ago

@jianfch only defaults of model.transcribe

import stable_whisper

def transcribe(input_path):
  model = stable_whisper.load_model('base')

  result = model.transcribe(input_path)

  return result
radzionc commented 1 year ago

In context, my aim is to edit a video recording for my development-focused YouTube channel. I want to eliminate pauses and segments where I misspoke. To streamline this process, I've been saying "Stop" after any misstep, hoping that this would make it easier to programmatically identify and remove the sections leading up to the "Stop" marker.

jianfch commented 1 year ago

The model has the tendency to omit repetitions. You can try temperature=0. If you only need to locate "stop", you also try locate(). See https://github.com/jianfch/stable-ts#locating-words.

# count=0 means look for all
model.locate(input_path, ' stop', 'English', count=0, verbose=True)