Open twicer-is-coder opened 1 year ago
The pipeline first create batches around 30mins because thats the optimal length for whisper. Afterwards, it chunks those sentences with the alignment model into sentences. So your resulting segments should be shorter (i get 5-10secs on average).
@twicer-is-coder A new argument --chunk_size
has been added at PR #445.
Please check if this resolves your issue.
The segments must be around between 1 - 5 seconds. The subtitles are not readable like this. I suspect due to this we are getting big segments. So I guess it should not be hardcoded but should be a option. I am not using any alignment model I am just transcribing. Is there any work around?