huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

How can I generate SRT file ? #46

Closed bmox closed 6 months ago

Lyken17 commented 7 months ago

You may use this script to convert json results to srt

def convert_time(data):
    seconds, milliseconds = map(int, str(data).split('.'))
    time_delta = timedelta(seconds=seconds, milliseconds=milliseconds)
    base_time = datetime(2000, 1, 1)

    result_time = base_time + time_delta
    result_str = result_time.strftime("%H:%M:%S.%f")[:-3]

    return result_str

def hf_pipeline_to_srt(json_result, output_file=None):
    file = pysrt.SubRipFile()
    for idx, chk in enumerate(json_result["chunks"]):
        text = chk["text"]
        start, end = map(convert_time, chk["timestamp"])

        sub = pysrt.SubRipItem(idx, 
            start=start, end=end, text=text.strip())
        file.append(sub)

    if output_file is not None:
        print(f"Saved to {output_file}")
        file.save(output_file)
        return output_file
    else:
        import io
        fp = io.StringIO("")
        file.write_into(fp)
        json_result = fp.getvalue()
        return json_result

full example can be found here

https://github.com/Lyken17/tiny-whisper

csukuangfj commented 6 months ago

By the way, the following huggingface space is using distil-whisper to generate SRT files for videos/audios https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos

It uses ONNX for distil-whisper and the underlying implementation is onnxruntime C++ API.

Screenshot 2023-12-14 at 22 35 38

bk111 commented 6 months ago

You may use this script to convert json results to srt

def convert_time(data):
    seconds, milliseconds = map(int, str(data).split('.'))
    time_delta = timedelta(seconds=seconds, milliseconds=milliseconds)
    base_time = datetime(2000, 1, 1)

    result_time = base_time + time_delta
    result_str = result_time.strftime("%H:%M:%S.%f")[:-3]

    return result_str

def hf_pipeline_to_srt(json_result, output_file=None):
    file = pysrt.SubRipFile()
    for idx, chk in enumerate(json_result["chunks"]):
        text = chk["text"]
        start, end = map(convert_time, chk["timestamp"])

        sub = pysrt.SubRipItem(idx, 
            start=start, end=end, text=text.strip())
        file.append(sub)

    if output_file is not None:
        print(f"Saved to {output_file}")
        file.save(output_file)
        return output_file
    else:
        import io
        fp = io.StringIO("")
        file.write_into(fp)
        json_result = fp.getvalue()
        return json_result

full example can be found here

https://github.com/Lyken17/tiny-whisper

Traceback (most recent call last): File "/root/distil-long-srt.py", line 40, in hf_pipeline_to_srt(result, output_file=output_srt) File "/root/utils.py", line 23, in hf_pipeline_to_srt start, end = map(convert_time, chk["timestamp"]) File "/root/utils.py", line 9, in convert_time seconds, milliseconds = map(int, str(data).split(".")) ValueError: invalid literal for int() with base 10: 'None'

bmox commented 6 months ago

While installing pysrt on google colab it creating some issue