m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.91k stars 1.26k forks source link

Out of memory while aligning wavs longer than 7 minutes (60 gb of ram) #311

Open konradipipan opened 1 year ago

konradipipan commented 1 year ago

Hi folks I am trying to perform forced alignment on some data. Everything works fine until I start to align files longer than 7-8 minutes. I am using CPU with 60 GB ram, but apparently that's not enough. Do you know how to overcome that issue (cutting wavs into shorter audios is not an option)

m-bain commented 1 year ago

Hi, unfortunately this makes the alignment task difficult and non-trivial. See suggestion here on how to approach:

https://github.com/m-bain/whisperX/issues/36#issuecomment-1577694676