Closed david-95 closed 7 months ago
Segment.start
and Segment.end
are the timestamps in seconds. The problem is that trimming with -c copy
is not accurate. You need to specify a codec for reencoding the audio (e.g. -c:a aac tar.aac
)
What confuses me is the the segment.start is not identified with timestamps in .srt, please see: -----it's in .srt ----- 00:00:01,300 --> 00:00:01,720 Wuthering Heights
-----it's in sgement ----- (5.22, 5.32)
I guess the sgement is late behind the timestamps in srt, because it's the real time of model processing finished.
If I am right, how can I get the correct timestamps ? how can I know the offset ?
It seems it needs more efforts to get matched timestamps from .srt file
What confuses me is the the segment.start is not identified with timestamps in .srt, please see: -----it's in .srt ----- 00:00:01,300 --> 00:00:01,720 Wuthering Heights
-----it's in sgement ----- (5.22, 5.32)
I guess the sgement is late behind the timestamps in srt, because it's the real time of model processing finished.
If I am right, how can I get the correct timestamps ? how can I know the offset ?
It seems it needs more efforts to get matched timestamps from .srt file
The timestamps in the result are mostly finalized and should generally remain identical to timestamps in the output file except for parts with duration shorter than the min_dur
, which is 0.02 second by default for all the result to output methods. But if this an edge case bug, it would be easier to figure out the cause if you can save the result as JSON and share it.
I think I found the reason, I called the model.transcribe in different way, then get different result: --result=model.transcribe(audio_path); result.segments >>> list >:Segment(start=1.3, end=2.04, text=" Wuthering Heights")Segment(start=2.68, end=3.96, text=" by Emily Bronte")...
--result=model.transcribe(audio_path,,word_timestamps=False) ; result.segments >>> list >:Segment(start=5.22, end=5.32, text=" Wuthering Heights by Emily Bronte")Segment(start=5.32, end=5.7, text=" CHAPTER I")
I don't know why params words_timestamps=False makes such difference. but obviously it doesn't make sense
Generally, I'd advise against using word_timestamps=False
because its timestamps are predicted via a less reliable method than that of used by word_timestamp=True
(default). word_timestamps=False
also severely limits the adjustments that can made for correcting the timestamps after the fact.
Thanks for your help! I am changing my code
Thank you for your efforts to look into my issue.
I am trying to parse a wav file, to find clip to match my text. firstly I call transcribe to get WhisperResult then call result.segments to get all segments tranverse segments to get the segment which text match my text but the segment.start and segment.end I cannot understand, I want to find the start-end timestamps , so I can cut the wav by calling "ffmpeg -i src.wav -ss start_timestamp -to end_timestamp -c copy tar.wav" but failed, the rootcause is segment.star and segment.end is not the timestamp, Can you please tell me how to get a segment's timestamp pair?