linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.87k stars 149 forks source link

Output filenames aren't consistent with original openai-whisper implementation #189

Closed lutangar closed 4 months ago

lutangar commented 5 months ago

openai-whisper

whisper "./video_short.mp4" --model tiny --output_format all --output_dir "./transcripts"

Output the following files:

video_short.json
video_short.srt
video_short.tsv
video_short.txt
video_short.vtt

whisper-timestamped

whisper_timestamped "./video_short.mp4" --model tiny --output_format all --output_dir "./transcripts"

Output the following files:

video_short.mp4.csv
video_short.mp4.srt
video_short.mp4.tsv
video_short.mp4.txt
video_short.mp4.vtt
video_short.mp4.words.csv
video_short.mp4.words.json
video_short.mp4.words.srt
video_short.mp4.words.tsv
video_short.mp4.words.vtt

Difference

opena-ai doesn't include original audio / video extension in the output filenames.

EDIT: Also just noticed the json version seems to be missing from whisper_timestamped output files.

Jeronymous commented 4 months ago

Indeed, whisper-timestamped reproduces an old behaviour of openai-whisper. And we think it is good as it is, to include the extension of input files in the ouput files (for someone who wants to test the transcription depending on the audio format for instance). The source is opened, so it can be easily modified.