mitodl / ocw-data-parser

A parsing script for MIT OpenCourseWare course data
0 stars 0 forks source link

Error retrieving closed captions for video #155

Open noisecapella opened 3 years ago

noisecapella commented 3 years ago

https://ocwnext.odl.mit.edu/courses/3-091sc-introduction-to-solid-state-chemistry-fall-2010/pages/reactions-and-kinetics/23-reaction-rates/

If you scroll down to "Lecture Video" and play the video, the closed captions don't display. The <track /> element does exist but the URL goes to a 404. The equivalent page on the legacy site has working closed captions https://ocw.mit.edu/courses/materials-science-and-engineering/3-091sc-introduction-to-solid-state-chemistry-fall-2010/reactions-and-kinetics/23-reaction-rates/

I think part of the problem is that the URL is still pointing to the legacy site but it has a .vtt extension. I think those should be S3 links but it's been a little while since I worked on that code

gumaerc commented 3 years ago

I've been discussing this with @annagav and there are a couple of issues here. First of all, in ocw-hugo-themes the youtube shortcode expects the third parameter to be a direct link to the location of the subtitles, referred to as subtitlesLocation. To be compatible with the way ocw-studio will generate content, I think we should reference the VTT file by UUID. A problem with that is that currently VTT files have an invalid UUID which is the UUID of the SRT file prefixed with vtt.

So, to solve this really 3 things should be done:

noisecapella commented 3 years ago

I think that makes sense in general, though it would be good for the UUID generated for the vtt file to be the same UUID for a given file so that we can run ocw-to-hugo repeatedly and get the same output

gumaerc commented 3 years ago

So something I missed here that @abeglova and @mbertrand filled me in on this morning is that there is already a strategy in place here for handling VTT subtitles in ocw-studio directly on the video_metadata property of a video resource.

https://github.com/mitodl/ocw-hugo-projects/blob/ac4efc50640f7bbccd661b22292705fe237652b9/ocw-course/ocw-studio.yaml#L129

Instead of creating a resource specifically for the VTT file, we should simply write a relative path directly to the file into this field. Since we won't be using the VTT files as resources in ocw-studio, I'm not sure that https://github.com/mitodl/ocw-data-parser/issues/157 needs to be worked on, although it would be better if all resources coming from the legacy had a valid UUID.