[Feature Request] Youtube Compatible Transcript

rajeshkumaryadavdotcom commented 1 year ago

Hi,

Thank you very much for whisper-jax, it is very useful.

I would like to request a feature on https://huggingface.co/spaces/sanchit-gandhi/whisper-jax after transcript is generated, I need to go to chatGPT and ask it to convert in that format which YouTube accepts.

Can you please enable one more radio option like transcribe, translate, YouTube subtitle and also you can have one more option to write YouTube video description based on transcript.

Regards, Raj

rajeshkumaryadavdotcom commented 1 year ago

Issue is chatGPT is not able to convert long video like 20 min based on your output to YouTube subtitle. It says "I apologize for any misunderstanding. Generating a large amount of text, such as full subtitles for a 19-minute video, is beyond the capabilities of this platform. However, I can help you generate a summary or key points from the video if you provide me with the specific details or time stamps of the sections you need assistance with. Please let me know how I can assist you further."

sanchit-gandhi commented 11 months ago

Hey @rajeshkumaryadavdotcom! Thanks for your interest in Whisper JAX and glad to hear it's a useful resource! The idea of the demo is that it's intended to be a demonstration (demo) of the Whisper model for speech transcription, rather than a fully-fledged meeting transcription tool. If you'd like to build these features on top of the demo, feel free to fork the space and add these new features on top! However, they're more along a product line than the ML demo this is purposed to be

iGerman00 commented 10 months ago

Hello, @sanchit-gandhi. I appreciate your generosity in providing the HF Space to the public, it's a great resource for general quick transcription tasks, but also for using its API - although it is hidden in the UI. I'm replying to this issue since it's on the topic of YouTube transcriptions.

I'm working on a userscript (mod) for YouTube that can transcribe any video and display the subtitles in the player natively. I've attached a demo video. I've been able to transcribe videos up to 50 minutes long. yt-dlp sometimes fails in the Space, or it returns a 504 on longer videos, but it usually works after a few tries. As you said, it is a demo, so I'm fine with that. I still have some things to finish, but it would become a very useful tool that I always wanted to have. It seems like an ideal use case for this, and it helps me a lot to have better automatic captions than YouTube's.

I wanted to ask if this is acceptable for you? I understand that running TPUs like that must be costly, but I read that it's supported by Google's TRC programme, so I just wanted to confirm if it's okay. I might publish my project in the future to a userscript directory, making my project be used by more people - although I am not sure exactly how many, or I can just keep it for personal use, depending on how okay you are with it.

Thank you in advance.

https://github.com/sanchit-gandhi/whisper-jax/assets/36676880/ffff049d-fac0-4854-8ee6-ba566c5d3b8d

iGerman00 commented 10 months ago

@rajeshkumaryadavdotcom If you are familiar with Node, I wrote a rough simple parser for the timestamped output of whisper-jax. You can modify it to suit your desired format:

const fs = require('fs');

function customFormatToJson(subtitleContent) {
    const subtitleBlocks = subtitleContent.split('\n'); // Assuming each subtitle is on a new line
    const jsonSubtitles = { events: [] };

    subtitleBlocks.forEach(block => {
        const timeTextSplit = block.split('] ');
        const timeRange = timeTextSplit[0].replace('[', '').split(' -> ');
        const startTime = customTimeToMs(timeRange[0]);
        const endTime = customTimeToMs(timeRange[1]);
        const text = timeTextSplit[1];

        jsonSubtitles.events.push({
            tStartMs: startTime,
            dDurationMs: endTime - startTime,
            segs: [{ utf8: text }]
        });
    });

    return jsonSubtitles;
}

function customTimeToMs(timeStr) {
    if (!timeStr || !timeStr.includes(":")) return 0;
    const [hoursMinSec, milli] = timeStr.split('.');
    // example: 15:22 570, if hours then 01:15:22 570
    const hours = hoursMinSec.length > 5 ? hoursMinSec.split(':')[0] : 0;
    const minutes = hoursMinSec.length > 5 ? hoursMinSec.split(':')[1] : hoursMinSec.split(':')[0];
    const seconds = hoursMinSec.length > 5 ? hoursMinSec.split(':')[2] : hoursMinSec.split(':')[1];
    const milliseconds = milli || 0;
    return parseInt(hours) * 3600000 + parseInt(minutes) * 60000 + parseInt(seconds) * 1000 + parseInt(milliseconds);
}

const srtContent = fs.readFileSync('jax-output-timestamps.txt', 'utf8');
const jsonSubtitles = customFormatToJson(srtContent);

console.log(JSON.stringify(jsonSubtitles, null, 2));

Currently, it takes in jax-output-timestamps.txt from the same directory as the script, and dumps the subtitles in YouTube's Timed Text API json3 format into the console, but it should be easy to modify it to your liking for example to output SRT or WebVTT text, or a file.

sanchit-gandhi / whisper-jax

[Feature Request] Youtube Compatible Transcript #151