openva / richmondsunlight.com

The Richmond Sunlight website.
https://www.richmondsunlight.com/
MIT License
12 stars 3 forks source link

Extract caption data #75

Open waldoj opened 8 years ago

waldoj commented 8 years ago

Apparently the General Assembly started closed captioning video last year. So, obviously, we need some way to both preserve and extract this data. Preserving that data for 2015 isn't gonna happen—that ship has sailed—but we ought to be able to extract it.

waldoj commented 8 years ago

I've got a start at using the CLI. The only problem that I see at the moment is that it can only hand one title per command. So I think I'll need to write a shell script to determine the number of titles, and then iterate over them with HandBrakeCLI.

waldoj commented 8 years ago

This should work, but it does not:

HandBrakeCLI -i s20160114.dvdmedia -o s20160114/s20160114-cli.mp4 -e x264 -q 20 -B 160 -t 1 --loose-anamorphic --modulus 2 --decomb --subtitle 1

It's the --subtitle 1 that's killing it. It dies with this message:

[mp4 @ 0x10212f600] Application provided invalid, non monotonically increasing dts to muxer in stream 2: 39284561 >= 39284561
ERROR: avformatMux: track 2, av_interleaved_write_frame failed with error 'Invalid argument'
[20:20:20] reader: done. 9 scr changes
[20:20:20] work: average encoding speed for job is 233.997925 fps
Encoding: task 1 of 1, 32.52 % (263.71 fps, avg 234.00 fps, ETA 00h02m16s)[20:20:20] sync: got 15309 frames, 47082 expected
[20:20:20] render: lost time: 0 (0 frames)
[20:20:20] render: gained time: 0 (0 frames) (0 not accounted for)
[20:20:20] mpeg2video-decoder done: 15316 frames, 0 decoder errors, 0 drops
[20:20:20] ac3-decoder done: 0 frames, 0 decoder errors, 0 drops
[20:20:20] mux: track 0, 13083 frames, 55035781 bytes, 1008.59 kbps, fifo 4096
[20:20:20] mux: track 1, 20462 frames, 7167726 bytes, 131.36 kbps, fifo 4096
[20:20:20] mux: track 2, 3 frames, 251 bytes, 0.00 kbps, fifo 64
[20:20:20] libhb: work result = 4

Encode failed (error 4).

HandBrake has exited.
waldoj commented 8 years ago

Regarding extracting captions from ripped DVD files, this is likewise not working:

mencoder -o /dev/null dvd://1 -dvd-device s20160114.dvdmedia/ -oac copy -ovc copy -vobsubout s20160114

It generates .idx and .sub files, but the former just has a header and the latter is empty. The output is not encouraging:

There are 2 titles on this DVD.
There are 9 chapters in this DVD title.
There are 1 angles in this DVD title.
audio stream: 0 format: ac3 (stereo) language: unknown aid: 128.
number of audio channels on disk: 1.
number of subtitles on disk: 0
success: format: 2  data: 0x0 - 0x2e698000
MPEG-PS file format detected.
VIDEO:  MPEG2  720x480  (aspect 2)  29.970 fps  8000.0 kbps (1000.0 kbyte/s)
[V] filefmt:2  fourcc:0x10000002  size:720x480  fps:29.97  ftime:=0.0334

Note that number of subtitles on disk: 0. There are, of course, subtitles.

waldoj commented 8 years ago

Everything I'm trying isn't working. FFmpeg, mencoder, Avidemux, VLC, and Handbrake. This is really frustrating.

waldoj commented 8 years ago

CCExtractor is the solution. It was wicked easy: cextractor *.VOB. It spat out a VIDEO_TS.srt file. It'll be trivial to write a shell script to generate SubRip file for every DVD.

waldoj commented 8 years ago

I batch-processed all 2015 and 2016 DVDs and uploaded the SRT files to the server.

waldoj commented 8 years ago

I test-uploaded an SRT to YouTube and...it's off by 10 seconds. The lag is a few seconds greater than would actually be with a live transcript. So I need to figure out how to time-shift those. There are desktop tools that do that, but that's not going to work.

waldoj commented 8 years ago

OK the transcript creator is up and running, with transcripts running live on the site.

waldoj commented 8 years ago

Wrote a transcript time-shifter, combined it with the duplication eliminator. The results of that can be seen here—it works great.

Next up: load SRTs into the database, figure out how to include them with YouTube uploads, and merge them existing MP4s.

waldoj commented 8 years ago

Including SRTs with YouTube uploads is not possible with the program I'm using now.

waldoj commented 6 years ago

Moved to rs-video-processor.