Closed jimydavis closed 7 months ago
Yes, it will be the case for "The brown fox, leapt over the dog."
. The audio
does not affect the way text
is split into words but language
, prepend_punctuations
, and append_punctuations
do.
For English, a simplify way for thinking of the splitting:
text
is split into words
by space (while keeping the space at the beginning of each word)prepend_punctuations
/append_punctuations
that are not already part of a word either prepend/append to an adjacent word.To replicate the exact process before calling align()
, you can run these lines in align()
:
https://github.com/jianfch/stable-ts/blob/ad013d7f80de2b090ccfe967eb7801c8094cdf8a/stable_whisper/alignment.py#L227-L230
If the text was I love Ed's . cookies.
how does it choose whether to attach the punctuation to Ed's
or to cookies
?
Thank you.
If the text was
I love Ed's . cookies.
how does it choose whether to attach the punctuation toEd's
or tocookies
?
It will attach to neither because the space before and after it. So it will be treated as its own word.
If I had audio that corresponded to text = "The brown fox, leapt over the dog." assuming its clean English speech, would
model.align
be guaranteed or probable to give back the same number of array elements of words as len(text.split()) ? In this case it should be 7 words. Assume also I am not using transcribe and I have the original transcript.Thank you!