starkdmi / MediaToolSwift

Advanced Swift library for media conversion and manipulation
https://starkdmi.github.io/MediaToolSwift/documentation/mediatoolswift
Mozilla Public License 2.0
79 stars 8 forks source link

Extract audio tracks? #5

Closed aehlke closed 7 months ago

aehlke commented 7 months ago

Is it possible to extract audio data from a video file / container? I'd like to use this to run Whisper transcription on audio tracks to get higher quality captions as text. Thanks

edit:

Would also be great to be able to extract subtitle files too.

starkdmi commented 7 months ago

Yes, you can extract an audio track only by passing the video file to AudioTool.convert().

AudioTool uses first audio track only and skip any others (the metadata is still copied).

aehlke commented 7 months ago

Thank you for confirming!

Is there also a way to extract subtitles? Either as text or image. It looks like the best alternative might be to use ffmpeg from swift for that.

starkdmi commented 7 months ago

@aehlke, How the subtitles are stored, the metadata track or separate SRT/VTT file?

aehlke commented 7 months ago

Ideally I'd support both of them. Looks like ffmpeg can help here with some work to convert non-text subtitle formats to text via selecting/rendering frames and OCRing them

starkdmi commented 7 months ago

I have no plans in handling all the possible subtitle formats.

@aehlke, You can process video frames to gather subtitles text using CoreML or OCR:

let request = VNRecognizeTextRequest()
CompressionVideoSettings(
    frameRate: 16, // lower the frame rate based your use case
    edit: [
        .process(.pixelBuffer { buffer, _, _, time in
            // Run ML intereference
            let handler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:])
            try? handler.perform([request])

            // TODO: Process results ... 
            // https://developer.apple.com/documentation/vision/recognizing_text_in_images#3601255

            // Pass pixel buffer for writing
            return buffer
        })
    ]
)