Closed mmichelli closed 1 year ago
Sorting the results before regrouping fixed the issue for me. Replicate the issue:
`words = [ {'word': ' innsikt.', 'start': 5287.78, 'end': 5288.22, 'probability': 0.5521641373634338}, {'word': ' mye', 'start': 5281.66, 'end': 5281.88, 'probability': 0.8746040463447571}, ]
text = "".join([w["word"] for w in words]) s = {"segments": [{ "words": words, "text": text}]} t = stable_whisper.WhisperResult(s)
( t .split_by_punctuation([('.', ' '), '。', '?', '?', ',', ',']) .split_by_gap(.5) .merge_by_gap(.3, max_words=3) ) `
Hello,
This error occurs when timestamps are not chronological. This is to avoid concatenating the words in the wrong order. It seems "innsikt" should be after "mye" according to the timestamps. If the data is expected to not be in order, then sorting the segments/words by timestamps before warping the data with WhisperResult
is a must.
The inference of the transcribe_any is passed a sorted list of words. But then it gets unsorted in the process.
The inference of the transcribe_any is passed a sorted list of words. But then it gets unsorted in the process.
This is not suppose to happen. A possible way I can see the words get "unsorted" is when the segment timestamps are sorted but word timestamps aren't. But seems that you only used word timestamps from the example you gave. It might possibly be a bug with mapping you used. Do you have an example that caused it to unsort?
s = {"segments": [{ "words": words, "text": text}]}
Once it's resorted, following the essential_mapping
, nest it in a list to make it a list with one segment:
words = [
{'word': ' mye',
'start': 5281.66,
'end': 5281.88,
'probability': 0.8746040463447571},
{'word': ' innsikt.',
'start': 5287.78,
'end': 5288.22,
'probability': 0.5521641373634338}
]
t = stable_whisper.WhisperResult([words])
The problem was the start and end overlapping. I had to nudge them to fit.
I think this is the same issue. https://github.com/jianfch/stable-ts/issues/192
words = [{'word': ' Henry', 'start': 1.7, 'end': 2.04, 'probability': 1.0}, {'word': ' 5,', 'start': 2.04, 'end': 2.76, 'probability': 0.9590921998023987}, {'word': ' Act', 'start': 2.76, 'end': 3.06, 'probability': 1.0}, {'word': ' 4,', 'start': 3.06, 'end': 3.78, 'probability': 1.0}, {'word': ' Scene', 'start': 3.78, 'end': 3.82, 'probability': 0.9658436179161072}, {'word': ' 3.', 'start': 3.82, 'end': 7.76, 'probability': 0.3984501361846924}, {'word': " What's", 'start': 8.56, 'end': 9.08, 'probability': 0.363200306892395}, {'word': ' he', 'start': 9.08, 'end': 9.26, 'probability': 0.44553032517433167}, {'word': ' that', 'start': 9.26, 'end': 9.56, 'probability': 0.8448543548583984}, {'word': ' wishes', 'start': 9.56, 'end': 9.84, 'probability': 0.9338713884353638}, {'word': ' so?', 'start': 9.84, 'end': 11.72, 'probability': 0.6788848638534546}, {'word': ' My', 'start': 11.96, 'end': 12.26, 'probability': 0.9889193177223206}, {'word': ' cousin', 'start': 12.26, 'end': 12.6, 'probability': 0.6532253623008728}, {'word': ' Westmulland,', 'start': 12.6, 'end': 14.32, 'probability': 0.07011916488409042}, {'word': ' Nae,', 'start': 14.32, 'end': 14.8, 'probability': 0.9945236444473267}, {'word': ' my', 'start': 14.8, 'end': 15.04, 'probability': 0.9825305342674255}, {'word': ' fair', 'start': 15.04, 'end': 15.26, 'probability': 0.9055658578872681}, {'word': ' cousin,', 'start': 15.26, 'end': 16.34, 'probability': 0.7704789638519287}, {'word': ' if', 'start': 16.34, 'end': 16.52, 'probability': 0.9821200966835022}, {'word': ' we', 'start': 16.52, 'end': 16.78, 'probability': 0.9091441035270691}, {'word': ' are', 'start': 16.78, 'end': 16.9, 'probability': 0.9614711403846741}, {'word': ' much', 'start': 16.9, 'end': 17.26, 'probability': 0.9594702124595642}, {'word': ' to', 'start': 17.26, 'end': 17.6, 'probability': 0.09238268435001373}, {'word': ' die,', 'start': 17.6, 'end': 18.54, 'probability': 0.9969301819801331}, {'word': ' we', 'start': 18.54, 'end': 18.72, 'probability': 0.9509456157684326}, {'word': ' are', 'start': 18.72, 'end': 18.92, 'probability': 0.9932367205619812}, {'word': ' now', 'start': 18.92, 'end': 19.32, 'probability': 0.4580133557319641}, {'word': ' to', 'start': 19.32, 'end': 19.54, 'probability': 0.506151020526886}, {'word': ' do', 'start': 19.54, 'end': 19.82, 'probability': 0.3940909802913666}, {'word': ' our', 'start': 19.82, 'end': 20.06, 'probability': 0.9344944357872009}, {'word': ' country', 'start': 20.24, 'end': 20.38, 'probability': 0.8191243410110474}, {'word': ' loss.', 'start': 20.38, 'end': 21.7, 'probability': 0.1895233690738678}, {'word': ' And', 'start': 21.72, 'end': 21.88, 'probability': 0.813640296459198}, {'word': ' if', 'start': 21.88, 'end': 22.06, 'probability': 0.8834676146507263}, {'word': ' to', 'start': 22.06, 'end': 22.24, 'probability': 0.7114039063453674}, {'word': ' live,', 'start': 22.24, 'end': 23.18, 'probability': 0.6408235430717468}, {'word': ' the', 'start': 23.18, 'end': 23.36, 'probability': 0.9934050440788269}, {'word': ' fewer', 'start': 23.36, 'end': 23.7, 'probability': 0.5186182856559753}, {'word': ' men,', 'start': 23.74, 'end': 24.6, 'probability': 0.6280710697174072}, {'word': ' the', 'start': 24.6, 'end': 24.6, 'probability': 0.9966585636138916}, {'word': ' greater', 'start': 24.6, 'end': 25.0, 'probability': 0.9711730480194092}, {'word': ' share', 'start': 25.0, 'end': 25.48, 'probability': 0.33705171942710876}, {'word': ' of', 'start': 25.48, 'end': 25.7, 'probability': 0.561707615852356}, {'word': ' honor.', 'start': 25.7, 'end': 27.44, 'probability': 0.951850414276123}, {'word': " God's", 'start': 27.44, 'end': 28.04, 'probability': 0.8429577350616455}, {'word': ' will,', 'start': 28.04, 'end': 28.4, 'probability': 0.9865183234214783}, {'word': ' I', 'start': 28.4, 'end': 28.4, 'probability': 0.9931000471115112}, {'word': ' pray,', 'start': 28.4, 'end': 28.76, 'probability': 0.4780261218547821}, {'word': ' the', 'start': 28.76, 'end': 29.04, 'probability': 0.9953845143318176}, {'word': ' wish', 'start': 29.04, 'end': 29.64, 'probability': 0.8897712230682373}, {'word': ' not', 'start': 29.64, 'end': 29.92, 'probability': 0.6575230360031128}, {'word': ' one', 'start': 29.92, 'end': 30.44, 'probability': 0.09977587312459946}, {'word': ' man', 'start': 30.44, 'end': 30.74, 'probability': 0.9957081079483032}, {'word': ' more.', 'start': 30.74, 'end': 32.56, 'probability': 0.9367187023162842}, {'word': ' By', 'start': 32.56, 'end': 32.9, 'probability': 0.8600592613220215}, {'word': ' jove,', 'start': 32.9, 'end': 33.98, 'probability': 0.5560446381568909}, {'word': ' I', 'start': 33.98, 'end': 34.06, 'probability': 0.08915555477142334}, {'word': ' am', 'start': 34.06, 'end': 34.26, 'probability': 0.998286783695221}, {'word': ' not', 'start': 34.26, 'end': 34.5, 'probability': 0.850245475769043}, {'word': ' covetous', 'start': 34.5, 'end': 35.0, 'probability': 0.9848424196243286}, {'word': ' for', 'start': 35.0, 'end': 35.3, 'probability': 0.8809939622879028}, {'word': ' gold.', 'start': 35.3, 'end': 36.8, 'probability': 0.9936336278915405}, {'word': ' Nor', 'start': 36.8, 'end': 37.12, 'probability': 0.7443109154701233}, {'word': ' care', 'start': 37.12, 'end': 37.44, 'probability': 0.9849539995193481}, {'word': ' I', 'start': 37.44, 'end': 37.68, 'probability': 0.8345310091972351}, {'word': ' houd', 'start': 37.68, 'end': 37.94, 'probability': 0.9935441017150879}, {'word': ' a', 'start': 37.94, 'end': 38.26, 'probability': 0.41833364963531494}, {'word': ' feed', 'start': 38.4, 'end': 38.48, 'probability': 0.9127079248428345}, {'word': ' upon', 'start': 38.48, 'end': 38.88, 'probability': 0.9854372143745422}, {'word': ' my', 'start': 38.88, 'end': 39.26, 'probability': 0.9862648844718933}, {'word': ' cost.', 'start': 39.26, 'end': 40.9, 'probability': 0.9924943447113037}, {'word': ' It', 'start': 40.9, 'end': 41.16, 'probability': 0.6000968217849731}, {'word': ' yearns', 'start': 41.16, 'end': 41.58, 'probability': 0.4420222342014313}, {'word': ' me', 'start': 41.58, 'end': 41.8, 'probability': 0.037282902747392654}, {'word': ' not', 'start': 41.8, 'end': 42.06, 'probability': 0.9976358413696289}, {'word': ' if', 'start': 42.06, 'end': 42.32, 'probability': 0.9869476556777954}, {'word': ' men', 'start': 42.32, 'end': 42.62, 'probability': 1.0}, {'word': ' my', 'start': 42.62, 'end': 42.88, 'probability': 0.9778580665588379}, {'word': ' garments', 'start': 42.88, 'end': 43.36, 'probability': 1.0}, {'word': ' wear', 'start': 43.36, 'end': 43.94, 'probability': 1.0}, {'word': ' such', 'start': 43.94, 'end': 44.7, 'probability': 0.6763264536857605}, {'word': ' outward', 'start': 44.7, 'end': 45.32, 'probability': 0.45042529702186584}, {'word': ' things', 'start': 45.32, 'end': 45.86, 'probability': 0.4206596910953522}, {'word': ' dwell', 'start': 45.86, 'end': 46.28, 'probability': 0.8374934196472168}, {'word': ' not', 'start': 46.28, 'end': 46.54, 'probability': 0.9309219121932983}, {'word': ' in', 'start': 46.54, 'end': 46.74, 'probability': 0.744561493396759}, {'word': ' my', 'start': 46.74, 'end': 47.0, 'probability': 0.9810428619384766}, {'word': ' desires.', 'start': 47.0, 'end': 49.12, 'probability': 0.7855569124221802}, {'word': ' But', 'start': 49.32, 'end': 49.42, 'probability': 0.9629272222518921}, {'word': ' if', 'start': 49.42, 'end': 49.66, 'probability': 0.8545044660568237}, {'word': ' it', 'start': 49.66, 'end': 49.92, 'probability': 0.9836477637290955}, {'word': ' be', 'start': 49.92, 'end': 50.12, 'probability': 0.40953701734542847}, {'word': ' a', 'start': 50.12, 'end': 50.46, 'probability': 0.8363914489746094}, {'word': ' sin', 'start': 50.46, 'end': 50.72, 'probability': 0.9696246981620789}, {'word': ' to', 'start': 50.72, 'end': 51.0, 'probability': 0.9745011329650879}, {'word': ' covet', 'start': 51.0, 'end': 51.3, 'probability': 0.9894445538520813}, {'word': ' honour,', 'start': 51.3, 'end': 52.78, 'probability': 0.4852699935436249}, {'word': ' I', 'start': 52.78, 'end': 52.94, 'probability': 0.5372000932693481}, {'word': ' am', 'start': 52.94, 'end': 53.28, 'probability': 0.5152359008789062}, {'word': ' the', 'start': 53.28, 'end': 53.52, 'probability': 0.9821651577949524}, {'word': ' most', 'start': 53.52, 'end': 53.88, 'probability': 0.9662154316902161}, {'word': ' offending', 'start': 53.88, 'end': 54.52, 'probability': 0.6431681513786316}, {'word': ' soul', 'start': 54.52, 'end': 54.92, 'probability': 0.6723597645759583}, {'word': ' alive.', 'start': 54.92, 'end': 56.6, 'probability': 0.748053789138794}, {'word': ' No,', 'start': 56.66, 'end': 57.6, 'probability': 0.7941516041755676}, {'word': ' faith', 'start': 57.6, 'end': 57.86, 'probability': 0.07308930903673172}, {'word': ' my', 'start': 57.86, 'end': 58.16, 'probability': 0.9933253526687622}, {'word': ' cus', 'start': 58.16, 'end': 58.58, 'probability': 0.6161521673202515}, {'word': ' wish', 'start': 58.58, 'end': 58.96, 'probability': 0.816256046295166}, {'word': ' not', 'start': 58.96, 'end': 59.3, 'probability': 0.8731276392936707}, {'word': ' a', 'start': 59.3, 'end': 59.46, 'probability': 0.9655333757400513}, {'word': ' man', 'start': 59.46, 'end': 59.7, 'probability': 0.8333145976066589}, {'word': ' from', 'start': 59.7, 'end': 59.96, 'probability': 0.874387264251709}, {'word': ' England.', 'start': 59.96, 'end': 61.74, 'probability': 0.8615880012512207}, {'word': " God's", 'start': 61.8, 'end': 62.36, 'probability': 0.8175302147865295}, {'word': ' peace,', 'start': 62.36, 'end': 63.46, 'probability': 0.17279177904129028}, {'word': ' I', 'start': 63.46, 'end': 63.5, 'probability': 0.6953412294387817}, {'word': ' would', 'start': 63.5, 'end': 63.7, 'probability': 0.7376256585121155}, {'word': ' not', 'start': 63.7, 'end': 64.06, 'probability': 0.9910030364990234}, {'word': ' lose', 'start': 64.06, 'end': 64.4, 'probability': 0.9933730363845825}, {'word': ' so', 'start': 64.4, 'end': 64.76, 'probability': 0.9913762211799622}, {'word': ' great', 'start': 64.76, 'end': 65.02, 'probability': 0.9042508602142334}, {'word': ' and', 'start': 65.02, 'end': 65.28, 'probability': 0.9927257299423218}, {'word': ' honour', 'start': 65.28, 'end': 65.5, 'probability': 0.9604440927505493}, {'word': ' as', 'start': 65.5, 'end': 65.86, 'probability': 0.9781786799430847}, {'word': ' one', 'start': 65.86, 'end': 66.2, 'probability': 0.5433875918388367}, {'word': ' man', 'start': 66.2, 'end': 66.6, 'probability': 0.7605411410331726}, {'word': ' more', 'start': 66.6, 'end': 66.96, 'probability': 0.3373698592185974}, {'word': ' me', 'start': 66.96, 'end': 67.22, 'probability': 0.9620576500892639}, {'word': ' thinks', 'start': 67.22, 'end': 67.6, 'probability': 0.1880161464214325}, {'word': ' would', 'start': 67.66, 'end': 67.94, 'probability': 0.5762460827827454}, {'word': ' share', 'start': 67.94, 'end': 68.24, 'probability': 0.45116931200027466}, {'word': ' from', 'start': 68.24, 'end': 68.54, 'probability': 0.16570298373699188}, {'word': ' me', 'start': 68.54, 'end': 68.98, 'probability': 0.9953333735466003}, {'word': ' for', 'start': 68.98, 'end': 69.3, 'probability': 0.666262149810791}, {'word': ' the', 'start': 69.3, 'end': 69.48, 'probability': 0.9983021020889282}, {'word': ' best', 'start': 69.48, 'end': 69.82, 'probability': 0.996383547782898}, {'word': ' hope', 'start': 69.82, 'end': 70.18, 'probability': 0.9602347612380981}, {'word': ' I', 'start': 70.18, 'end': 70.4, 'probability': 0.9076172709465027}, {'word': ' have.', 'start': 70.4, 'end': 71.74, 'probability': 0.8834810853004456}, {'word': ' Oh,', 'start': 71.74, 'end': 72.02, 'probability': 0.20234237611293793}, {'word': ' do', 'start': 72.02, 'end': 72.14, 'probability': 0.959312915802002}, {'word': ' not', 'start': 72.14, 'end': 72.4, 'probability': 0.8601191639900208}, {'word': ' wish', 'start': 72.4, 'end': 72.84, 'probability': 0.848604142665863}, {'word': ' one', 'start': 72.84, 'end': 73.34, 'probability': 0.9204859137535095}, {'word': ' more.', 'start': 73.34, 'end': 75.28, 'probability': 0.9845459461212158}, {'word': ' Rather,', 'start': 75.32, 'end': 76.26, 'probability': 0.8275904059410095}, {'word': ' proclaim', 'start': 76.26, 'end': 76.62, 'probability': 0.9816035032272339}, {'word': ' it,', 'start': 76.62, 'end': 77.14, 'probability': 0.6175978183746338}, {'word': ' Westmelland,', 'start': 77.14, 'end': 77.88, 'probability': 0.912691593170166}, {'word': ' through', 'start': 77.88, 'end': 78.1, 'probability': 0.7590121626853943}, {'word': ' my', 'start': 78.1, 'end': 78.36, 'probability': 0.958415687084198}, {'word': ' host,', 'start': 78.36, 'end': 79.32, 'probability': 0.8469499945640564}, {'word': ' that', 'start': 79.32, 'end': 79.5, 'probability': 0.25268152356147766}, {'word': ' he', 'start': 79.5, 'end': 79.76, 'probability': 0.9955581426620483}, {'word': ' which', 'start': 79.76, 'end': 80.12, 'probability': 0.9935936331748962}, {'word': ' hath', 'start': 80.12, 'end': 80.38, 'probability': 1.0}, {'word': ' no', 'start': 80.38, 'end': 80.62, 'probability': 0.9840901494026184}, {'word': ' stomach', 'start': 80.62, 'end': 81.04, 'probability': 1.0}, {'word': ' to', 'start': 81.04, 'end': 81.28, 'probability': 1.0}, {'word': ' this', 'start': 81.28, 'end': 81.5, 'probability': 0.7029576301574707}, {'word': ' fight', 'start': 81.62, 'end': 81.88, 'probability': 0.3251306116580963}, {'word': ' let', 'start': 81.88, 'end': 82.34, 'probability': 0.9817564487457275}, {'word': ' him', 'start': 82.34, 'end': 82.7, 'probability': 0.9565202593803406}, {'word': ' depart.', 'start': 82.7, 'end': 84.14, 'probability': 0.9433295726776123}, {'word': ' His', 'start': 84.14, 'end': 84.54, 'probability': 0.9706352353096008}, {'word': ' passport', 'start': 84.54, 'end': 85.1, 'probability': 0.8905162811279297}, {'word': ' shall', 'start': 85.1, 'end': 85.5, 'probability': 0.9873468279838562}, {'word': ' be', 'start': 85.5, 'end': 85.7, 'probability': 0.808259129524231}, {'word': ' made', 'start': 85.7, 'end': 86.04, 'probability': 0.9538906812667847}, {'word': ' and', 'start': 86.04, 'end': 86.52, 'probability': 0.7879480123519897}, {'word': ' crowns', 'start': 86.52, 'end': 87.06, 'probability': 0.9761564135551453}, {'word': ' for', 'start': 87.06, 'end': 87.26, 'probability': 0.8684403300285339}, {'word': ' Convoy', 'start': 87.26, 'end': 87.8, 'probability': 0.976298987865448}, {'word': ' put', 'start': 87.88, 'end': 88.04, 'probability': 0.991665780544281}, {'word': ' into', 'start': 88.04, 'end': 88.3, 'probability': 0.9892922639846802}, {'word': ' his', 'start': 88.3, 'end': 88.62, 'probability': 0.9522150754928589}, {'word': ' purse.', 'start': 88.62, 'end': 90.06, 'probability': 0.612728476524353}, {'word': ' We', 'start': 90.06, 'end': 90.24, 'probability': 0.043250672519207}, {'word': ' would', 'start': 90.24, 'end': 90.62, 'probability': 0.9783042669296265}, {'word': ' not', 'start': 90.62, 'end': 91.04, 'probability': 0.9733952283859253}, {'word': ' die', 'start': 91.04, 'end': 91.44, 'probability': 0.993177056312561}, {'word': ' in', 'start': 91.44, 'end': 91.76, 'probability': 0.9943971633911133}, {'word': ' that', 'start': 91.76, 'end': 92.06, 'probability': 0.9929214119911194}, {'word': " man's", 'start': 92.06, 'end': 92.6, 'probability': 0.8035719990730286}, {'word': ' company', 'start': 92.6, 'end': 93.0, 'probability': 0.9930883049964905}, {'word': ' that', 'start': 93.0, 'end': 93.68, 'probability': 0.994566798210144}, {'word': ' fears', 'start': 93.68, 'end': 94.14, 'probability': 0.7012796401977539}, {'word': ' his', 'start': 94.14, 'end': 94.6, 'probability': 0.6103168725967407}, {'word': ' fellowship', 'start': 94.6, 'end': 95.0, 'probability': 0.8624662160873413}, {'word': ' to', 'start': 95.0, 'end': 95.44, 'probability': 0.967224657535553}, {'word': ' die', 'start': 95.44, 'end': 95.64, 'probability': 0.9914968609809875}, {'word': ' with', 'start': 95.68, 'end': 95.86, 'probability': 0.9838773608207703}, {'word': ' us.', 'start': 95.86, 'end': 97.98, 'probability': 0.9963111281394958}, {'word': ' This', 'start': 97.98, 'end': 98.52, 'probability': 0.6463856101036072}, {'word': ' day', 'start': 98.52, 'end': 99.0, 'probability': 0.9954323768615723}, {'word': ' is', 'start': 99.0, 'end': 99.66, 'probability': 0.6326208114624023}, {'word': ' called', 'start': 99.66, 'end': 100.02, 'probability': 0.9046921730041504}, {'word': ' the', 'start': 100.02, 'end': 100.3, 'probability': 0.8743824362754822}, {'word': ' feast', 'start': 100.3, 'end': 100.56, 'probability': 0.03843006491661072}, {'word': ' of', 'start': 100.56, 'end': 100.84, 'probability': 0.990936815738678}, {'word': ' Crispian.', 'start': 100.84, 'end': 102.4, 'probability': 0.965117335319519}, {'word': ' He', 'start': 102.4, 'end': 102.68, 'probability': 0.7625917792320251}, {'word': ' that', 'start': 102.68, 'end': 102.88, 'probability': 0.7541912794113159}, {'word': ' out', 'start': 102.88, 'end': 103.34, 'probability': 0.7593938708305359}, {'word': ' lives', 'start': 103.34, 'end': 103.64, 'probability': 0.37422049045562744}, {'word': ' this', 'start': 103.64, 'end': 104.06, 'probability': 0.6946720480918884}, {'word': ' day', 'start': 104.06, 'end': 104.34, 'probability': 0.8276910185813904}, {'word': ' and', 'start': 104.34, 'end': 104.62, 'probability': 0.9627363681793213}, {'word': ' comes', 'start': 104.62, 'end': 105.16, 'probability': 0.9825707077980042}, {'word': ' safe', 'start': 105.16, 'end': 105.54, 'probability': 0.993502676486969}, {'word': ' home.', 'start': 105.54, 'end': 106.84, 'probability': 0.9972918629646301}, {'word': ' Will', 'start': 106.84, 'end': 107.14, 'probability': 0.9979428648948669}, {'word': ' stand', 'start': 107.14, 'end': 107.52, 'probability': 0.8447971343994141}, {'word': ' a', 'start': 107.52, 'end': 107.82, 'probability': 0.0340547189116478}, {'word': ' tiptoe', 'start': 107.96, 'end': 108.34, 'probability': 0.9949049949645996}, {'word': ' in', 'start': 108.34, 'end': 108.56, 'probability': 0.8227512240409851}, {'word': ' the', 'start': 108.56, 'end': 108.7, 'probability': 0.9850172400474548}, {'word': " day's", 'start': 108.7, 'end': 109.12, 'probability': 0.9346296787261963}, {'word': ' name', 'start': 109.12, 'end': 109.5, 'probability': 0.5043765902519226}, {'word': ' and', 'start': 109.5, 'end': 110.06, 'probability': 0.9935750365257263}, {'word': ' rouse', 'start': 110.06, 'end': 110.4, 'probability': 0.9919309020042419}, {'word': ' him', 'start': 110.4, 'end': 110.82, 'probability': 0.9979161620140076}, {'word': ' at', 'start': 110.82, 'end': 111.18, 'probability': 0.9559215307235718}, {'word': ' the', 'start': 111.18, 'end': 111.32, 'probability': 0.9742921590805054}, {'word': ' name', 'start': 111.32, 'end': 111.66, 'probability': 0.9606645107269287}, {'word': ' of', 'start': 111.66, 'end': 112.02, 'probability': 0.7336221933364868}, {'word': ' Crispian.', 'start': 112.02, 'end': 113.88, 'probability': 0.9678494930267334}, {'word': ' He', 'start': 113.88, 'end': 114.2, 'probability': 0.9552794694900513}, {'word': ' that', 'start': 114.2, 'end': 114.48, 'probability': 0.9911489486694336}, {'word': ' shall', 'start': 114.48, 'end': 114.82, 'probability': 0.9661858081817627}, {'word': ' live', 'start': 114.82, 'end': 115.08, 'probability': 0.8502776622772217}, {'word': ' this', 'start': 115.08, 'end': 115.48, 'probability': 0.5063650012016296}, {'word': ' day', 'start': 115.48, 'end': 115.78, 'probability': 0.6658087968826294}, {'word': ' and', 'start': 115.78, 'end': 116.1, 'probability': 0.5431900024414062}, {'word': ' see', 'start': 116.1, 'end': 116.44, 'probability': 0.5601219534873962}, {'word': ' old', 'start': 116.44, 'end': 116.86, 'probability': 0.19629497826099396}, {'word': ' age,', 'start': 116.86, 'end': 117.98, 'probability': 0.9959339499473572}, {'word': ' will', 'start': 117.98, 'end': 118.24, 'probability': 0.9932892918586731}, {'word': ' yearly', 'start': 118.24, 'end': 118.84, 'probability': 1.0}, {'word': ' on', 'start': 118.84, 'end': 119.3, 'probability': 0.9906017780303955}, {'word': ' the', 'start': 119.3, 'end': 119.4, 'probability': 1.0}, {'word': ' vigil', 'start': 119.4, 'end': 119.88, 'probability': 1.0}, {'word': ' feast', 'start': 119.88, 'end': 120.72, 'probability': 0.6698790192604065}, {'word': ' his', 'start': 120.72, 'end': 121.12, 'probability': 0.7503746151924133}, {'word': ' neighbors', 'start': 121.12, 'end': 121.52, 'probability': 0.6490254998207092}, {'word': ' and', 'start': 121.52, 'end': 122.04, 'probability': 0.9821033477783203}, {'word': ' say,', 'start': 122.02, 'end': 122.96, 'probability': 0.9268665909767151}, {'word': ' tomorrow', 'start': 122.96, 'end': 123.26, 'probability': 0.5009294748306274}, {'word': ' is', 'start': 123.26, 'end': 124.3, 'probability': 0.9850856065750122}, {'word': ' sent', 'start': 124.3, 'end': 124.6, 'probability': 0.9847975373268127}, {'word': ' crispyen.', 'start': 124.6, 'end': 126.32, 'probability': 0.9974495768547058}, {'word': ' Then', 'start': 126.32, 'end': 126.64, 'probability': 0.9732151031494141}, {'word': ' he', 'start': 126.64, 'end': 126.9, 'probability': 0.9242735505104065}, {'word': ' will', 'start': 126.9, 'end': 127.12, 'probability': 0.9414713978767395}, {'word': ' strip', 'start': 127.12, 'end': 127.46, 'probability': 0.6670272946357727}, {'word': ' his', 'start': 127.46, 'end': 127.88, 'probability': 0.5762494802474976}, {'word': ' sleeve', 'start': 127.88, 'end': 128.2, 'probability': 0.9311427474021912}, {'word': ' and', 'start': 128.2, 'end': 128.72, 'probability': 0.9781582355499268}, {'word': ' show', 'start': 128.72, 'end': 129.04, 'probability': 0.9458707571029663}, {'word': ' his', 'start': 129.04, 'end': 129.34, 'probability': 0.8592180013656616}, {'word': ' scars', 'start': 129.34, 'end': 129.8, 'probability': 0.391526460647583}, {'word': ' and', 'start': 129.8, 'end': 130.2, 'probability': 0.7136589288711548}, {'word': ' say,', 'start': 130.2, 'end': 130.98, 'probability': 0.16941921412944794}, {'word': ' these', 'start': 130.98, 'end': 131.32, 'probability': 0.9936263561248779}, {'word': ' wounds', 'start': 131.32, 'end': 131.82, 'probability': 0.5070556402206421}, {'word': ' I', 'start': 131.82, 'end': 132.26, 'probability': 0.9580144286155701}, {'word': ' had', 'start': 132.26, 'end': 132.48, 'probability': 0.9415930509567261}, {'word': ' on', 'start': 132.48, 'end': 132.8, 'probability': 0.981964111328125}, {'word': " crispyen's", 'start': 132.8, 'end': 133.58, 'probability': 0.7696945071220398}, {'word': ' day.', 'start': 133.58, 'end': 135.74, 'probability': 0.9960699081420898}, {'word': ' Old', 'start': 135.74, 'end': 136.04, 'probability': 0.9986554384231567}, {'word': ' men', 'start': 136.04, 'end': 136.38, 'probability': 0.9675906300544739}, {'word': ' forget,', 'start': 136.38, 'end': 137.74, 'probability': 0.990267813205719}, {'word': ' yet', 'start': 137.74, 'end': 137.94, 'probability': 0.9961097836494446}, {'word': ' all', 'start': 137.94, 'end': 138.34, 'probability': 0.8383417129516602}, {'word': ' shall', 'start': 138.34, 'end': 138.62, 'probability': 0.5445013642311096}, {'word': ' be', 'start': 138.62, 'end': 138.82, 'probability': 0.7203402519226074}, {'word': ' forgotten,', 'start': 138.82, 'end': 139.76, 'probability': 0.9314321279525757}, {'word': ' but', 'start': 139.76, 'end': 140.18, 'probability': 0.9961525797843933}, {'word': " he'll", 'start': 140.18, 'end': 140.54, 'probability': 0.9934590458869934}, {'word': ' remember', 'start': 140.54, 'end': 140.98, 'probability': 0.9770448207855225}, {'word': ' with', 'start': 140.98, 'end': 141.5, 'probability': 0.9736282825469971}, {'word': ' advantages', 'start': 141.5, 'end': 142.14, 'probability': 0.6066042184829712}, {'word': ' what', 'start': 142.14, 'end': 142.88, 'probability': 0.022118214517831802}, {'word': ' feats', 'start': 142.88, 'end': 143.32, 'probability': 0.9883117079734802}, {'word': ' he', 'start': 143.32, 'end': 143.56, 'probability': 0.8625694513320923}, {'word': ' did', 'start': 143.56, 'end': 143.8, 'probability': 0.8823891282081604}, {'word': ' that', 'start': 143.8, 'end': 144.16, 'probability': 0.9584640860557556}, {'word': ' day.', 'start': 144.16, 'end': 145.36, 'probability': 0.9402573108673096}, {'word': ' Then', 'start': 145.66, 'end': 145.92, 'probability': 0.7004079818725586}, {'word': ' shall', 'start': 145.92, 'end': 146.08, 'probability': 0.2830100357532501}, {'word': ' our', 'start': 146.08, 'end': 146.4, 'probability': 0.3568965792655945}, {'word': ' names', 'start': 146.4, 'end': 146.84, 'probability': 0.2413010150194168}, {'word': ' for', 'start': 146.84, 'end': 147.34, 'probability': 0.3644302487373352}, {'word': ' Milia', 'start': 147.34, 'end': 147.78, 'probability': 0.9248873591423035}, {'word': ' in', 'start': 147.78, 'end': 148.04, 'probability': 0.9844214916229248}, {'word': ' his', 'start': 148.04, 'end': 148.42, 'probability': 0.9903256893157959}, {'word': ' mouth', 'start': 148.42, 'end': 148.7, 'probability': 0.9660508632659912}, {'word': ' as', 'start': 148.7, 'end': 149.06, 'probability': 0.8974336981773376}, {'word': ' household', 'start': 149.06, 'end': 149.5, 'probability': 0.9850775599479675}, {'word': ' words,', 'start': 149.5, 'end': 150.58, 'probability': 0.956802487373352}, {'word': ' Harry', 'start': 150.58, 'end': 150.78, 'probability': 0.5126129388809204}, {'word': ' the', 'start': 150.78, 'end': 151.04, 'probability': 0.2806437611579895}, {'word': ' King,', 'start': 151.04, 'end': 151.98, 'probability': 0.9967509508132935}, {'word': ' Bedford,', 'start': 151.98, 'end': 152.7, 'probability': 0.9129139184951782}, {'word': ' and', 'start': 152.72, 'end': 152.96, 'probability': 0.9910532832145691}, {'word': ' Exeter,', 'start': 152.96, 'end': 154.26, 'probability': 0.9525956511497498}, {'word': ' Warwick', 'start': 154.26, 'end': 154.62, 'probability': 0.720944344997406}, {'word': ' and', 'start': 154.62, 'end': 154.88, 'probability': 0.11709990352392197}, {'word': ' Talwood,', 'start': 154.88, 'end': 156.18, 'probability': 0.9977444410324097}, {'word': ' Solesbury', 'start': 156.18, 'end': 156.9, 'probability': 0.773591935634613}, {'word': ' and', 'start': 156.9, 'end': 157.32, 'probability': 0.8129531145095825}, {'word': ' Gloster,', 'start': 157.32, 'end': 158.82, 'probability': 0.9759193062782288}, {'word': ' be', 'start': 158.82, 'end': 158.98, 'probability': 0.9959449172019958}, {'word': ' in', 'start': 158.98, 'end': 159.2, 'probability': 0.8204134702682495}, {'word': ' their', 'start': 159.2, 'end': 159.44, 'probability': 0.9754608273506165}, {'word': ' flowing', 'start': 159.44, 'end': 159.84, 'probability': 0.6328381299972534}, {'word': ' cups', 'start': 159.84, 'end': 160.32, 'probability': 0.9973748922348022}, {'word': ' freshly', 'start': 160.32, 'end': 160.94, 'probability': 0.9986624717712402}, {'word': ' remembered.', 'start': 160.94, 'end': 163.02, 'probability': 0.9859511256217957}, {'word': ' This', 'start': 163.36, 'end': 163.66, 'probability': 0.5194078683853149}, {'word': ' story', 'start': 163.66, 'end': 164.14, 'probability': 0.9786112904548645}, {'word': ' shall', 'start': 164.14, 'end': 164.68, 'probability': 0.9839714765548706}, {'word': ' the', 'start': 164.68, 'end': 164.88, 'probability': 0.9876208901405334}, {'word': ' good', 'start': 164.88, 'end': 165.16, 'probability': 0.5584031343460083}, {'word': ' man', 'start': 165.16, 'end': 165.58, 'probability': 0.9162559509277344}, {'word': ' teach', 'start': 165.58, 'end': 165.94, 'probability': 0.3964194655418396}, {'word': ' his', 'start': 165.94, 'end': 166.32, 'probability': 0.9137334227561951}, {'word': ' son,', 'start': 166.32, 'end': 167.42, 'probability': 0.9651820063591003}, {'word': ' and', 'start': 167.42, 'end': 167.74, 'probability': 0.041444286704063416}, {'word': ' crisp', 'start': 167.74, 'end': 168.06, 'probability': 0.9956591129302979}, {'word': ' and', 'start': 168.06, 'end': 168.32, 'probability': 0.9967233538627625}, {'word': ' crisp', 'start': 168.6, 'end': 168.6, 'probability': 1.0}, {'word': ' and', 'start': 168.6, 'end': 168.98, 'probability': 0.9848842620849609}, {'word': ' shall', 'start': 168.98, 'end': 169.28, 'probability': 1.0}, {'word': ' narrow', 'start': 169.28, 'end': 169.68, 'probability': 1.0}, {'word': ' by', 'start': 169.68, 'end': 170.16, 'probability': 0.07378227263689041}, {'word': ' from', 'start': 170.16, 'end': 170.48, 'probability': 0.9918357133865356}, {'word': ' this', 'start': 170.48, 'end': 170.86, 'probability': 0.6686174869537354}, {'word': ' day', 'start': 170.86, 'end': 171.2, 'probability': 0.9925986528396606}, {'word': ' to', 'start': 171.2, 'end': 171.56, 'probability': 0.9947487711906433}, {'word': ' the', 'start': 171.56, 'end': 171.78, 'probability': 0.9707620143890381}, {'word': ' ending', 'start': 171.78, 'end': 172.08, 'probability': 0.972222626209259}, {'word': ' of', 'start': 172.08, 'end': 172.38, 'probability': 0.6845805644989014}, {'word': ' the', 'start': 172.38, 'end': 172.5, 'probability': 0.9519438147544861}, {'word': ' world,', 'start': 172.5, 'end': 173.18, 'probability': 0.9938291907310486}, {'word': ' but', 'start': 173.18, 'end': 173.44, 'probability': 0.5227317214012146}, {'word': ' we,', 'start': 173.44, 'end': 174.14, 'probability': 0.6085188984870911}, {'word': ' in', 'start': 174.14, 'end': 174.26, 'probability': 0.5757820010185242}, {'word': ' it', 'start': 174.26, 'end': 174.48, 'probability': 0.9916446805000305}, {'word': ' shall', 'start': 174.48, 'end': 174.72, 'probability': 0.9929982423782349}, {'word': ' be', 'start': 174.78, 'end': 174.96, 'probability': 0.971552848815918}, {'word': ' remembered.', 'start': 174.96, 'end': 176.4, 'probability': 0.4877663850784302}, {'word': ' We,', 'start': 176.4, 'end': 177.02, 'probability': 0.9785329699516296}, {'word': ' few,', 'start': 177.02, 'end': 178.18, 'probability': 0.9809420108795166}, {'word': ' we', 'start': 178.18, 'end': 178.52, 'probability': 0.9169884920120239}, {'word': ' happy', 'start': 178.52, 'end': 178.76, 'probability': 0.6535576581954956}, {'word': ' few,', 'start': 178.76, 'end': 180.14, 'probability': 0.851703405380249}, {'word': ' we,', 'start': 180.14, 'end': 180.82, 'probability': 0.1298087239265442}, {'word': ' band', 'start': 180.82, 'end': 181.1, 'probability': 0.9949583411216736}, {'word': ' of', 'start': 181.1, 'end': 181.42, 'probability': 0.8349753022193909}, {'word': ' brothers,', 'start': 181.42, 'end': 182.6, 'probability': 0.9432376027107239}, {'word': ' for', 'start': 182.6, 'end': 182.78, 'probability': 0.9420830607414246}, {'word': ' he', 'start': 182.76, 'end': 183.04, 'probability': 0.9819048047065735}, {'word': ' today', 'start': 183.04, 'end': 183.48, 'probability': 0.9244788289070129}, {'word': ' that', 'start': 183.48, 'end': 183.98, 'probability': 0.9611664414405823}, {'word': ' sheds', 'start': 183.98, 'end': 184.3, 'probability': 0.9837024211883545}, {'word': ' his', 'start': 184.3, 'end': 184.64, 'probability': 0.9976171851158142}, {'word': ' blood', 'start': 184.64, 'end': 184.86, 'probability': 0.9861067533493042}, {'word': ' with', 'start': 184.86, 'end': 185.24, 'probability': 0.9750077724456787}, {'word': ' me', 'start': 185.24, 'end': 185.7, 'probability': 0.9956795573234558}, {'word': ' shall', 'start': 185.7, 'end': 186.06, 'probability': 0.9887726902961731}, {'word': ' be', 'start': 186.06, 'end': 186.46, 'probability': 0.9930782914161682}, {'word': ' my', 'start': 186.46, 'end': 186.76, 'probability': 0.9905368685722351}, {'word': ' brother,', 'start': 186.76, 'end': 187.86, 'probability': 0.8854169845581055}, {'word': ' be', 'start': 187.86, 'end': 188.14, 'probability': 0.940584123134613}, {'word': ' he', 'start': 188.2, 'end': 188.42, 'probability': 0.8966628313064575}, {'word': ' near', 'start': 188.42, 'end': 188.78, 'probability': 0.9287273287773132}, {'word': ' so', 'start': 188.78, 'end': 189.1, 'probability': 0.7715998291969299}, {'word': ' vile', 'start': 189.1, 'end': 189.64, 'probability': 0.6637495756149292}, {'word': ' this', 'start': 189.64, 'end': 190.02, 'probability': 0.16761845350265503}, {'word': ' day', 'start': 190.02, 'end': 190.48, 'probability': 0.9946725964546204}, {'word': ' shall', 'start': 190.48, 'end': 190.8, 'probability': 0.9654229283332825}, {'word': ' gentle', 'start': 190.8, 'end': 191.22, 'probability': 0.9972444772720337}, {'word': ' his', 'start': 191.22, 'end': 191.56, 'probability': 0.8070518970489502}, {'word': ' condition.', 'start': 191.56, 'end': 193.48, 'probability': 0.7991375923156738}, {'word': ' And', 'start': 193.54, 'end': 193.78, 'probability': 0.9931772947311401}, {'word': ' gentlemen', 'start': 193.78, 'end': 194.22, 'probability': 0.9504886865615845}, {'word': ' in', 'start': 194.22, 'end': 194.78, 'probability': 0.9621501564979553}, {'word': ' England,', 'start': 194.78, 'end': 195.72, 'probability': 0.929545521736145}, {'word': ' now', 'start': 195.72, 'end': 195.86, 'probability': 0.6121333241462708}, {'word': ' a', 'start': 195.86, 'end': 196.12, 'probability': 0.9964145421981812}, {'word': ' bed,', 'start': 196.12, 'end': 196.96, 'probability': 0.2524227797985077}, {'word': ' shall', 'start': 196.96, 'end': 197.12, 'probability': 0.658929705619812}, {'word': ' think', 'start': 197.12, 'end': 197.42, 'probability': 0.7481545805931091}, {'word': ' themselves', 'start': 197.42, 'end': 198.0, 'probability': 0.9239662289619446}, {'word': ' a', 'start': 198.0, 'end': 198.46, 'probability': 0.930683434009552}, {'word': ' cursed', 'start': 198.46, 'end': 198.74, 'probability': 0.8568612933158875}, {'word': ' they', 'start': 198.74, 'end': 199.22, 'probability': 0.7124426960945129}, {'word': ' were', 'start': 199.22, 'end': 199.4, 'probability': 0.9768089652061462}, {'word': ' not', 'start': 199.4, 'end': 199.72, 'probability': 0.9724793434143066}, {'word': ' here,', 'start': 199.72, 'end': 200.78, 'probability': 0.9073761105537415}, {'word': ' and', 'start': 200.78, 'end': 201.1, 'probability': 0.8443229794502258}, {'word': ' hold', 'start': 201.1, 'end': 201.4, 'probability': 0.7187377214431763}, {'word': ' their', 'start': 201.4, 'end': 201.72, 'probability': 0.8702306747436523}, {'word': " manhood's", 'start': 201.72, 'end': 202.56, 'probability': 0.7346722483634949}, {'word': ' cheap,', 'start': 202.56, 'end': 203.06, 'probability': 0.3713756799697876}, {'word': ' whilst', 'start': 203.06, 'end': 203.42, 'probability': 0.6203047633171082}, {'word': ' any', 'start': 203.42, 'end': 203.88, 'probability': 0.9979606866836548}, {'word': ' speaks', 'start': 203.86, 'end': 204.34, 'probability': 1.0}, {'word': ' that', 'start': 204.34, 'end': 205.18, 'probability': 0.9891883730888367}, {'word': ' fought', 'start': 205.18, 'end': 205.44, 'probability': 1.0}, {'word': ' with', 'start': 205.44, 'end': 205.8, 'probability': 1.0}, {'word': ' us,', 'start': 205.8, 'end': 206.84, 'probability': 0.8624158501625061}, {'word': ' upon', 'start': 206.84, 'end': 207.12, 'probability': 0.8685139417648315}, {'word': ' St.', 'start': 207.12, 'end': 207.72, 'probability': 0.997857391834259}, {'word': " Crispin's", 'start': 207.94, 'end': 208.5, 'probability': 0.30532675981521606}, {'word': ' Day.', 'start': 208.5, 'end': 211.0, 'probability': 0.5154129266738892}]
I'm using a small model (huggingface) and a long mp3. I sometimes get an error.
The error messages: Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
``--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[13], line 1 ----> 1 out = t.transcribe("../test/agder_itc.mp3", language="no") 2 # out.to_srt_vtt('../eef3d20ea4ac5b3f11d624e442f822d6.vtt')6 3 # out.save_as_json("../eef3d20ea4ac5b3f11d624e442f822d6.json")
Cell In[8], line 60, in Transcriber.transcribe(self, audio, language, generate_kwargs) 57 set_language = language or self.language 58 self.update_pipeline(set_language) ---> 60 return self.pipe.stable_whisper_transcribe(audio, generate_kwargs=generate_kwargs)
File /pipelines/ProbabilityASRPipelineWithWhisper.py:35, in ProbabilityASRPipelineWithWhisper.stable_whisper_transcribe(self, input_audio, kwargs) 32 segment = self(audio, return_timestamps="word", return_language=True, kwargs) 33 return dict(segments=[segment]) if segment else None ---> 35 return stable_whisper.transcribe_any(inference, input_audio, vad=True, regroup=True, input_sr=SAMPLE_RATE, inference_kwargs=kwargs)
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/non_whisper.py:328, in transcribe_any(inference_func, audio, audio_type, input_sr, model_sr, inference_kwargs, temp_file, verbose, regroup, suppress_silence, suppress_word_ts, q_levels, k_size, demucs, demucs_device, demucs_output, vad, vad_threshold, vad_onnx, min_word_dur, only_voice_freq, only_ffmpeg) 325 result.suppress_silence(*silent_timings, min_word_dur=min_word_dur, word_level=suppress_word_ts) 327 if result.has_words and regroup: --> 328 result.regroup(regroup) 330 finally: 331 if temp_audio_file is not None:
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/result.py:996, in WhisperResult.regroup(self, regroup_algo, verbose, only_show) 994 print(f'{methods[method].name}({", ".join(map(str, args))})') 995 if not only_show: --> 996 methodsmethod 998 return self
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/result.py:749, in WhisperResult.merge_by_gap(self, min_gap, max_words, max_chars, is_sum_max, lock) 729 """ 730 731 Merge (in-place) any pair of adjacent segments if the duration in between the pair <= [min_gap] (...) 746 747 """ 748 indices = self.get_gap_indices(min_gap) --> 749 self._merge_segments(indices, 750 max_words=max_words, max_chars=max_chars, is_sum_max=is_sum_max, lock=lock) 751 return self
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/result.py:697, in WhisperResult._merge_segments(self, indices, max_words, max_chars, is_sum_max, lock) 677 if ( 678 ( 679 max_words and (...) 694 ) 695 ): 696 continue --> 697 self.add_segments(i, i + 1, inplace=True, lock=lock) 698 self.remove_no_word_segments()
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/result.py:506, in WhisperResult.add_segments(self, index0, index1, inplace, lock) 505 def add_segments(self, index0: int, index1: int, inplace: bool = False, lock: bool = False): --> 506 new_seg = self.segments[index0] + self.segments[index1] 507 new_seg.update_seg_with_words() 508 if lock and self.segments[index0].has_words:
File ~/mambaforge/envs/video_transcription_server/lib/python3.10/site-packages/stable_whisper/result.py:157, in Segment.add(self, other) 156 def add(self, other: 'Segment'): --> 157 assert self.start <= other.start or self.end <= other.end 159 self_copy = deepcopy(self) 161 self_copy.start = min(self_copy.start, other.start)
AssertionError:``