Closed bmox closed 6 months ago
By the way, the following huggingface space is using distil-whisper to generate SRT files for videos/audios https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos
It uses ONNX for distil-whisper and the underlying implementation is onnxruntime C++ API.
You may use this script to convert json results to srt
def convert_time(data): seconds, milliseconds = map(int, str(data).split('.')) time_delta = timedelta(seconds=seconds, milliseconds=milliseconds) base_time = datetime(2000, 1, 1) result_time = base_time + time_delta result_str = result_time.strftime("%H:%M:%S.%f")[:-3] return result_str def hf_pipeline_to_srt(json_result, output_file=None): file = pysrt.SubRipFile() for idx, chk in enumerate(json_result["chunks"]): text = chk["text"] start, end = map(convert_time, chk["timestamp"]) sub = pysrt.SubRipItem(idx, start=start, end=end, text=text.strip()) file.append(sub) if output_file is not None: print(f"Saved to {output_file}") file.save(output_file) return output_file else: import io fp = io.StringIO("") file.write_into(fp) json_result = fp.getvalue() return json_result
full example can be found here
Traceback (most recent call last):
File "/root/distil-long-srt.py", line 40, in
While installing pysrt on google colab it creating some issue
You may use this script to convert json results to srt
full example can be found here
https://github.com/Lyken17/tiny-whisper