Open xenova opened 5 days ago
To help with debugging, here are the decoded outputs of each chunk:
for output in model_outputs:
print(tokenizer.batch_decode(output['tokens']))
["<|startoftranscript|><|notimestamps|> DO IT! Just DO IT! Don't let your dreams be dreams. Yesterday, you said tomorrow, so just DO IT! MAKE YOUR DRIMS! CONTRO! JUST DO IT! Some people dream success while you're gonna wake up and work hard at it. Nothing is impossible.<|endoftext|>"]
["<|startoftranscript|><|notimestamps|> Some people dream success while you're gonna wake up and work hard at it. Nothing is impossible. You should get to the point where anyone else would quit and you're not gonna stop there. No, what are you waiting for? Do it! Just do it! Yes, you can! Just do it!<|endoftext|>"]
['<|startoftranscript|><|notimestamps|> Just do it! Yes you can! Just do it! If your tire is starting over, stop giving up.<|endoftext|>']
Indeed, the duplicated phrasing is at the word boundaries, so we can see where the algorithm messes up.
System Info
transformers
version: 4.42.3Who can help?
@sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Minimal reproduction:
produces the following incorrect transcript:
(Notice at ~46 seconds, it goes back in time):
For reference, this is the media I am transcribing.
Expected behavior