Add Video Support - Githubissues

kelea99 commented 1 year ago

User Story

As a PUL Librarian in Area Studies, I am in contact with documentary film makers and campus partners who are interested in the library hosting, preserving and making available documentary videos that will contribute to the scholarly dissemination of information and research. In order to do this, I need to be able to ingest mp4 videos into figgy and display them to DPUL in a viewer. As a first step, be able to ingest these into non-ephemera collections.

Impact

Please include hard deadlines, if the exhibit is part of an event, the issue is stopping work, etc. We have 2 current examples in the works for the above user story and video support has been requested and hoped for for years.

Priority recommendation

[ ] asap
[ ] within the next 3 weeks
[x] PO will prioritize - MULE ticket priority

Sudden Priority Justification

_Required if "asap" or "within the next

3 weeks" is checked. Add "Sudden Priority" and "Maintenance/Research" labels_

Expected behaviour

Actual behaviour

Steps to reproduce behaviour

Screenshots

Use Cases

As Princeton University I should only allow the public to view videos that have subtitles which can render in the viewer both to legally protect ourselves and more importantly to meet our goals of accessibility. We want (.VTT) files for transcription subtitles so that we can standardize and migrate if need be.

As a Figgy staff member I should be able to ingest a video associated with a resource (ScannedResource or EphemeraFolder), mark it complete, add subtitles to it later, and it should render in DPUL, the Catalog, and Finding Aids so that researchers can view that video. This means I should be able to add subtitles later which will then make it viewable to the public. (It has to preserve, even if it doesn't have subtitles yet, and I have to be able to add subtitles later.)

As a Figgy staff member I should be able to bulk ingest several videos, each with their own resource along with their captions so that I can have a vendor mass digitize materials and then ingest them later.

As a Figgy staff member I should be able to ingest several videos to one resource, each with their own captions, and the viewer should render them with a table of contents so that I can display multiple videos in a single "box" of content in the archives.

As Princeton University I want to preserve the captions along with the video so that if Figgy's data is ever lost I can restore them.

As a researcher I want to have multiple captions each in their own language and users should be able to switch between them so that I can provide translations in addition to transcription.

As a member of the public I should be able to view videos and its attendant transcription as subtitles in DPUL, the Catalog, and Finding Aids in order to do my research.

We might want to be able to facet by all the video resources.

Sub Tickets (Create during specification)

[x] Ingesting an MP4 should create HLS derivatives.
- Test file: https://github.com/pulibrary/figgy/assets/2806645/774a8206-ec39-47df-8a06-3f4a4aac0e08
[x] HLS derivatives for a video should play in Universal Viewer.
[x] #6179
- These should attach as another FileMetadata node with a Use. Probably http://pcdm.org/use#Transcript
[x] #6180
[x] #6181
- https://www.vidbeo.com/blog/hls-subtitles-captions-webvtt/
- Test WebVTT:
```
WEBVTT

00:00.000 --> 00:03.000
Test Caption
```
[x] #6182
- How do we enable bypass of this for Pathe baby, which are silent videos? Maybe a "silent video" property in the FileSets
- [x] #6183

tpendragon commented 1 year ago

Background info that's relevant:

We can't make videos public that don't have captions. Videos that are not public (but accessible by the public) need to have a way to request captions and have that actually happen.

Questions that need answered:

[ ] Is there a small thing we can do to close this ticket and iterate that bypasses that caption requirement? For instance - is it enough to ingest the video, make it playable, but not render it for non-staff.
[ ] How does someone request captions?
[ ] When requesting captions, what happens?
[ ] How fast do captions need to turn around in?
[ ] What kind of captions do we support?
[ ] If a video gets uploaded without an associated caption file, what should Figgy do? Prevent marking complete? Prevent marking anything but private?
[ ] For video that HAS no audio (silent films), what's best practice for captions? Do we still need a caption that says there's no audio or something, or is it just not an issue?
[ ] We need a video WITH captions to test with.

Questions DLS needs to answer:

[ ] Where are we going to put all these derivatives? The Isilon? That's where audio derivatives are, but the performance is slow and videos are a lot bigger. Do we need to store and stream these from the cloud?
[ ] How do we model captions in Figgy? Presumably it's a separate file - how do we say that the video "1.mp4" is linked to the caption "1-caption.vtt", if there can be multiple videos in a single ScannedResource

kelea99 commented 1 year ago

Questions that need answered:

Is there a small thing we can do to close this ticket and iterate that bypasses that caption requirement? For instance - is it enough to ingest the video, make it playable, but not render it for non-staff. How does someone request captions?

no one would be requesting captions from us. They would be providing us captions. this is someone else's responsibility. Captions for video should be required for completion of an object in figgy not marked as "Private" visibility. Minimum metadata is required before ingest or completion of objects in figgy as a best practice. Captions for video should be just as important, if not moreso, due to legal ramifications for their absence.

When requesting captions, what happens?

We can offer the names of up to three vendors for them or suggest the workflow provided by Barbara here...

How fast do captions need to turn around in?
without captions, objects should not marked as anything but "Private" visibility. Turnound time will be dependent on the primary stakeholder.

What kind of captions do we support?

format mentioned by @tpendragon : vtt (and/or?? srt?? scc?? ttml?? )
Language(s)/Script(s): Multi lingual collections are being proposed.

How do we model captions in Figgy? Presumably it's a separate file - how do we say that the video "1.mp4" is linked to the caption "1-caption.vtt", if there can be multiple videos in a single ScannedResource

This would be a cataloging questions as much as an ingest question. For multiple tapes/cds, would it be ingested as a MVW? or would a different workflow be necessary for video (and audio?)?

If a video gets uploaded without an associated caption file, what should Figgy do? Prevent marking complete? Prevent marking anything but private?

Only Private option as default. Get user/administration feedback on completion question. If there are no captions, are we willing to still provide access via VRR/OARSC for the private material? If so, completion of private video with no captions would be allowed.

For video that HAS no audio (silent films), what's best practice for captions? Do we still need a caption that says there's no audio or something, or is it just not an issue?

needs research but @escowles mentioned "No Linguistic Content" as a best practice/option for this use case

Where are we going to put all these derivatives? The Isilon? That's where audio derivatives are, but the performance is slow and videos are a lot bigger. Do we need to store and stream these from the cloud?

discuss cloud

We need a video WITH captions to test with.

Forthcoming

kelea99 commented 1 year ago

UPDATE: If we had our choice, we would want to implement this in Ephemera more than Scanned Resource. It would better serve the current use cases.

escowles commented 1 year ago

How do we model captions in Figgy? Presumably it's a separate file - how do we say that the video "1.mp4" is linked to the caption "1-caption.vtt", if there can be multiple videos in a single ScannedResource

One way to model this would be to attach the caption files to the same FileSet as the video file, using the mime type and/or use attributes to identify them as caption files. This would let you have multiple video files in a single ScannedResource and keep the captions attached to the correct video files.

tpendragon commented 1 year ago

Video will be our next cycle priority, so we're pulling it off as a MULE ticket.

kelea99 commented 11 months ago

Completion of items needed for full preservation. So, Instead of hindering completion of an item until it has captioning, can we hinder visibility of item until it has captioning? So, All video is private until captioning is added?

kelea99 commented 11 months ago

ensure this item is checked in on, after support is up and running: https://figgy.princeton.edu/concern/scanned_resources/8d38e275-5d77-4288-a972-c90e2e08dd8c/file_manager

kelea99 commented 11 months ago

Future User Request for video support: search video by time stamp, if we have transcription.

pulibrary / figgy

Add Video Support #5914

User Story

Upcoming collections

Existing work

File examples

Impact

Priority recommendation

Sudden Priority Justification

Expected behaviour

Actual behaviour

Steps to reproduce behaviour

Screenshots

Use Cases

Sub Tickets (Create during specification)