Closed electro199 closed 1 year ago
Also in lattest versions it is impossible to use
tag=(r"{\1c&H34ebde&}", r"{\r}"),
due to added changes that never allow the function which is responsible to use tags
I have a fix for that
def result_to_ass(result: (dict, list),
filepath: str = None,
segment_level=True,
word_level=True,
min_dur: float = 0.02,
tag: Tuple[str, str] = None,
font: str = None,
use_tag = True, # this change here
font_size: int = 24,
strip=True,
highlight_color: str = None,
karaoke=False,
reverse_text: Union[bool, tuple] = False,
**kwargs):
"""
Generate Advanced SubStation Alpha (ASS) file from result to display segment-level and/or word-level timestamp.
Note: ass file is used in the same way as srt, vtt, etc.
string of content if no [filepath] is provided, else None
"""
if highlight_color is None and (karaoke or (word_level and segment_level)):
highlight_color = '00ff00'
...
...
return result_to_any(
result=result,
filepath=filepath,
filetype='ass',
segments2blocks=segments2blocks,
segment_level=segment_level,
word_level=word_level,
min_dur=min_dur,
tag=tag,
default_tag=(r'{\1c' + f'{highlight_color}&' + '}', r'{\r}'),
strip=strip,
reverse_text=reverse_text,
to_word_level_string_callback= None if use_tag else ( # and here
lambda s, t: to_ass_word_level_segments(s, t, karaoke=karaoke)
if karaoke or (word_level and segment_level)
else None
)
)
Thanks for pointing issue with result_to_ass()
. tag
should work as intended in latest commit. Note that tag
is ignored if word_level=False
.
I couldn't replicate the timing issue with audio clip you've provided using the default options. Do you know the arguments you passed into transcribe()
with **options
?
Yeah, I was using word_level=False to turn off effects otherwise, it will use the default green effect and the not-given tag. the error happens randomly even when with same audio gives the right result and sometimes, it does not. I also noticed if it ran from cli with JSON output it is less likely to give the errors.
The **options
is "en" for language detection.
more info may help in recreating the error
No GPU was used, the base model was used,
if you try to transcribe multiple times then you may able to recreate the error. transcribing in python program cause errors most of the time for me.
This nondeterministic behavior might be due to the temperature fallback.
Use temperature=0
or --temperature_increment_on_fallback None
for CLI to make it deterministic.
Sure I will try this
After testing multiple I am still having the issue (although much less)
This time transcription had the same line repeating 70% of the script (with the initial promote I was not using initial promote before)
Also transcription on small audio works fine
Also, some results shows old behavior of making some empty delay and then stretching the other segment
Transcriptions with audio less than 60 sec are issued 1 out of 10 times and the one with the error was all last segments missing 10 seconds in which length of the audio was around 58 seconds
After testing multiple I am still having the issue (although much less)
If it is still not deterministic after setting temperature to 0, this issue is likely caused by factors outside of Stable-ts. It is like multiplying the same numbers but get different results each time. You can try it with just Whisper to see if you get similar behavior.
After testing with Whisper I am not getting any issue
The transcripts were all the exact same across hundreds of times I tested on my end with the audio you provided. Also ran small test in colab and results were also consistent: https://colab.research.google.com/drive/1eqFZqXAIR_NNvgfI-SB-1Fqkd0GMOm2r?usp=sharing.
Is there any logger that I can use to get more info on why it is happening on my application ?
Can you share results as json files of a run with the issue and one without it? If you're using version 2.9+, try to see if you can reproduce same issue with version 2.8.1. If the issue doesn't occur with 2.8.1, then it is likely a bug with 2.9.
Yes, I can send you raw JSON files.
I have tested on mostly 2.6.4 although tested newer versions I got the same issue.
I am now using vanilla Whisper with python dict to ass by stable-ts it is working fine for me.
The JSON could help narrow down the cause of the issue. Have you tested the latest version 2.9.0? There were changes to how it chooses where it begins transcribing every audio chunk.
Sometimes the transcription misses the segments and stretches the segment after the missed segment to start of the segments
The timing error is not consistent it usually appears after the 40-second mark. This kind of error does not happen all the time
it is the trimmed version of the clip https://github.com/jianfch/stable-ts/assets/109358640/6257d3a4-bac5-4b48-84cb-d492373d64e9
the ASS file show similar thing
script I am using
I again rain program to test after running the model again to predict there is no missing segments(I have a script to text empty space b/w segments in json file)
I reran the script and now other subtitle segments are missing.
The audio I ran the transcription on
https://github.com/jianfch/stable-ts/assets/109358640/7fe7b427-49ed-454f-ac28-a963187d940f