boxabirds commented 1 year ago

Hi attached is an mp3 that has the first line of the verse at 8s and the second at 16s. But it's not being reported as such; in particular start is always zero (but should be closer to 8s).

= Expected =

output would record the start time correctly

= Observed =

start time is always 0 (for segment and first word) (see lines 7 and 33 of attached json file)
subsequent timings are incorrect by varying amounts (second line of the song is logged as 2s earlier than it happens).

= To reproduce =

Download this file
Run the python code below
Compare against the attached json file

parser.add_argument("--input", help="Input file path", required=True)

# Parse the arguments
args = parser.parse_args()

input_filename = args.input

def transcribe_audio(audio_file):
    MODEL_SIZE = "small" # from openai-whisper: tiny, base, small, medium, large https://github.com/openai/whisper

    audio = whisper.load_audio(audio_file)
    model = whisper.load_model(MODEL_SIZE, device="cpu")
    result = whisper.transcribe_timestamped(model, audio, language="en")
    return result

result = transcribe_audio(input_filename)

full_transcript_filename = args.input + "-" + MODEL_SIZE + ".json"
# write the result to the output file
with open(full_transcript_filename, "w") as f:
    f.write(json.dumps(result, indent = 2, 

ensure_ascii = False))

of note

model size has no effect
I'm using the direct from git installation as I don't think whisper-timestamped is on pypi yet?

seaside-clip-long.mp3-small.json.zip

boxabirds commented 1 year ago

If it helps I have (correct) output from what I think was v 1.7.2 -- not sure if you have regression tests in place to check?

boxabirds commented 1 year ago

Update: adding vad=True fixed the issue of first timestamp being zero, but the other settings made no difference. I'll keep investigating.

Jeronymous commented 1 year ago

Thank you for reporting. There are regression tests in place, but it's hard to guarantee that results won't degrade on particular cases.

I fixed a wrong heuristics that was causing the trouble you experienced, when there is music before speech. It should be better now.

boxabirds commented 1 year ago

Amazing thank you so much. It’s such a great tool.

On Tue, 9 May 2023 at 10:36, Jérôme Louradour @.***> wrote:

Closed #91 https://github.com/linto-ai/whisper-timestamped/issues/91 as completed via 863d56d https://github.com/linto-ai/whisper-timestamped/commit/863d56d0d1dfe779ca2dd73f4db2df2f48e6108e .

— Reply to this email directly, view it on GitHub https://github.com/linto-ai/whisper-timestamped/issues/91#event-9198323583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JTV4LBFMPDIWZGG2DXFIF3TANCNFSM6AAAAAAXYDRMVY . You are receiving this because you authored the thread.Message ID: @.***>

linto-ai / whisper-timestamped

Trouble with timings #91

of note