jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 177 forks source link

faster whisper: Timestamps are not in ascending order. #205

Closed YeDaxia closed 1 year ago

YeDaxia commented 1 year ago

code:

model = stable_whisper.load_faster_whisper('base')
result = model.transcribe_stable('test.mp3')

error:

/usr/local/lib/python3.10/dist-packages/stable_whisper/result.py in raise_for_unsorted(self) 527 return 528 if ((timestamps[1:] - timestamps[:-1]) < 0).any(): --> 529 raise NotImplementedError(f'Timestamps are not in ascending order. ' 530 f'For transcribe_any() or data not produced by Stable-ts, ' 531 f'sort segments/words by timestamps. '

NotImplementedError: Timestamps are not in ascending order. For transcribe_any() or data not produced by Stable-ts, sort segments/words by timestamps. Otherwise, please submit an issue.

YeDaxia commented 1 year ago

I try the newest stable-ts, and found this problem still occurs, but it doesn’t fail every time, the same audio file may succeed or fail.

jianfch commented 1 year ago

Can you update stable-ts to the latest commit and share the timestamps of a run that fails?

result = model.transcribe_stable('test.mp3', check_sorted=False)
try:
    result.raise_for_unsorted()
except NotImplementedError:
    timestamps = [[[w.start, w.end] for w in s.words] for s in res.segments]
    with open('timestamps.txt', 'w') as f:
        json.dump(timestamps, f)
YeDaxia commented 1 year ago
  1. It seems fail to dump timestamps :
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-8-956e5b63e760> in transcribe(media)
     41   try:
---> 42     result.raise_for_unsorted()
     43   except NotImplementedError:

3 frames
NotImplementedError: Timestamps are not in ascending order. For transcribe_any() or data not produced by Stable-ts, sort segments/words by timestamps. Otherwise, please submit an issue.

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-8-956e5b63e760> in <listcomp>(.0)
     42     result.raise_for_unsorted()
     43   except NotImplementedError:
---> 44     timestamps = [[[w.start, w.end] for w in s.words] for s in result.segments]
     45     with open(note_book_path + 'timestamps.txt', 'w') as f:
     46         json.dump(timestamps, f)

TypeError: 'NoneType' object is not iterable
  1. and I try to dump segments:
result = model.transcribe_stable(note_book_path + media['file'], check_sorted=False)
  try:
    result.raise_for_unsorted()
    # saveSegments(result.segments)
  except NotImplementedError:
    # timestamps = [[[w.start, w.end] for w in s.words] for s in result.segments]
    with open(note_book_path + 'timestamps.txt', 'w') as f:
        json.dump(result.segments, f)

exception output:

NotImplementedError: Timestamps are not in ascending order. For transcribe_any() or data not produced by Stable-ts, sort segments/words by timestamps. Otherwise, please submit an issue.

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
[/usr/lib/python3.10/json/encoder.py](https://localhost:8080/#) in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type Segment is not JSON serializable
  1. dump result use to_dict(), success, here is the file:

https://drive.google.com/file/d/1wePL-EgHeSixz8PLYCFo1V91IYG4I5gB/view?usp=drive_link

  try:
    result.raise_for_unsorted()
    # saveSegments(result.segments)
  except NotImplementedError:
    # timestamps = [[[w.start, w.end] for w in s.words] for s in result.segments]
    with open(note_book_path + 'timestamps.txt', 'w') as f:
        json.dump(result.to_dict(), f)
jianfch commented 1 year ago

Looks it partly caused by the lack of word timestamps. Try to use word_timestamps=True.