Closed aehlke closed 7 months ago
Yes, you can extract an audio track only by passing the video file to AudioTool.convert()
.
AudioTool uses first audio track only and skip any others (the metadata is still copied).
Thank you for confirming!
Is there also a way to extract subtitles? Either as text or image. It looks like the best alternative might be to use ffmpeg from swift for that.
@aehlke, How the subtitles are stored, the metadata track or separate SRT/VTT file?
Ideally I'd support both of them. Looks like ffmpeg can help here with some work to convert non-text subtitle formats to text via selecting/rendering frames and OCRing them
I have no plans in handling all the possible subtitle formats.
@aehlke, You can process video frames to gather subtitles text using CoreML or OCR:
let request = VNRecognizeTextRequest()
CompressionVideoSettings(
frameRate: 16, // lower the frame rate based your use case
edit: [
.process(.pixelBuffer { buffer, _, _, time in
// Run ML intereference
let handler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:])
try? handler.perform([request])
// TODO: Process results ...
// https://developer.apple.com/documentation/vision/recognizing_text_in_images#3601255
// Pass pixel buffer for writing
return buffer
})
]
)
Is it possible to extract audio data from a video file / container? I'd like to use this to run Whisper transcription on audio tracks to get higher quality captions as text. Thanks
edit:
Would also be great to be able to extract subtitle files too.