m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
10.17k stars 1.07k forks source link

Some transcriptions missing properties #792

Open sachaw opened 2 months ago

sachaw commented 2 months ago

Very rarely I get a segment without the start, end, score properties

an example:

[
  {
    "word": "Unit",
    "start": 0.629,
    "end": 1.27,
    "score": 0.763,
    "speaker": "SPEAKER_01"
  },
  {
    "word": "1."
  },
  {
    "word": "Page",
    "start": 2.131,
    "end": 2.831,
    "score": 0.809,
    "speaker": "SPEAKER_01"
  },
  {
    "word": "58."
  },
  {
    "word": "Listening",
    "start": 3.992,
    "end": 4.392,
    "score": 0.89,
    "speaker": "SPEAKER_01"
  },
]

My code is the same as the readme example This is the audio recording: https://nicetalkingwithyou.com/wp-content/uploads/2018/07/012NTWY_U4_CL.mp3

Any insight would be much appreciated.

NicholasTheGreat commented 2 months ago

You need to suppress numerals.