AV Technical Metadata - Githubissues

tomcrane commented 4 years ago

For IIIF Presentation 3, we need some information that the pseudo-IIIF approach to AV got by without.

For Audio and Video, we need a precise duration, to the second, and preferably to a couple of decimal places.
For Video, we need the canvas dimensions to generate the annotation space. E.g., { "width": 1920, "height": 1080 }. At the very least we need an aspect ratio to assert a suitable coordinate space.

These dimensions should be those of the master, not the derivatives (don't assert a 720p coordinate space for a film that you might offer a higher-res version of later; even if you don't plan to offer more pixels later, a higher-resolution annotation space is better suited to the user interfaces of annotation software - people drawing boxes).

The technical metadata in METS doesn't really provide this, at least for the older things.

@aray-wellcome I might need to take a look at the proposed AV METS after all.

Ideally, the data required for duration, width and height would be right there in the PREMIS technical metadata in the METS, and not in a form that needs parsing (and might even be hand-entered). Here's an old one:

<premis:significantProperties>
  <premis:significantPropertiesType>Duration</premis:significantPropertiesType>
  <premis:significantPropertiesValue>9mn 46s</premis:significantPropertiesValue>
</premis:significantProperties>

(from https://iiif-test.wellcomecollection.org/dash/Peek/XmlView/b16756654/b16756654_0001.xml)

We don't have numerical duration, width and height for these videos, so how do we find that out?

The DLCS could report back on this metadata after it has seen the master for transcoding - but that is usually going to be a long time after the IIIF-Builder needs it. Even though the DLCS sync is the first task in the list, that just queues it up at the DLCS before moving on to assembling IIIF JSON; it could be half an hour before ElasticTranscoder gets to it even without a big queue ahead of it.

One probably not very helpful suggestion would be to go back and update the METS with this technical metadata. This would be the ideal, I think.

What might be easier is a component the DDS could use to acquire this data from the source asset, which it can see in S3. Such a component could be a Python API wrapper around ffprobe (https://ffmpeg.org/ffprobe.html) deployed as a service that can see assets in S3.

Dealing with Slop

The duration of the Canvas and the durations of the various transcoded formats painted onto it are going to vary ever so slightly, because of the transcoding process. Given an hour long super hi-res MXF, a 720p mp4 or webm transcode will not be identical in length to the original, or to each other. That's OK though, sensible clients should accommodate this slop.

aray-wellcome commented 4 years ago

@tomcrane Here are the new METS for a MXF/MP4 upload (the MP4 to be used by DDS, the MXF goes into deep store), and a MPG upload.

b30496160.txt This is the MXF/MP4

b30496020.txt This is the MPG

tomcrane commented 4 years ago

Thanks @aray-wellcome !

How do these values get into the technical metadata?

<premis:significantPropertiesValue>58s</premis:significantPropertiesValue>

<premis:significantPropertiesValue>58 s 917 ms</premis:significantPropertiesValue>

...and is info about aspect ratio/width,height available, but just not included at the moment?

aray-wellcome commented 4 years ago

And here are some of the METS for the audio uploads. We'll have all WAV files or all MP3 files in the ingests and the ingests may or may not contain: structural metadata in the Logical section, jpg poster images, or PDF transcripts. I'm still running tests but I have these:

b31630327.txt 2 WAVS, structural metadata, no transcript, no poster image

b22488522.txt 7 WAVS and a PDF transcript but no structural metadata, no poster image

tomcrane commented 4 years ago

Some interesting things in the audio:


          <mods:physicalDescription>
            <mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
            <mods:extent>1 encoded audio file (42 min.)</mods:extent>
          </mods:physicalDescription>

ok...


            <premis:significantProperties>
              <premis:significantPropertiesType>Duration</premis:significantPropertiesType>
              <premis:significantPropertiesValue>30mn 34s</premis:significantPropertiesValue>
            </premis:significantProperties>

but also


            <premis:significantProperties>
              <premis:significantPropertiesType>Duration</premis:significantPropertiesType>
              <premis:significantPropertiesValue>1834.640381</premis:significantPropertiesValue>
            </premis:significantProperties>

...which is good!

aray-wellcome commented 4 years ago

Thanks @aray-wellcome !

How do these values get into the technical metadata?

<premis:significantPropertiesValue>58s</premis:significantPropertiesValue>

<premis:significantPropertiesValue>58 s 917 ms</premis:significantPropertiesValue>

...and is info about aspect ratio/width,height available, but just not included at the moment?

Robert Sehr from Intranda says that these values come from the exif metadata from the video file itself. They don't change anything there, they include the raw values.

tomcrane commented 4 years ago

Yeah, so different producing software yields different strings that would need parsing.

Does that exif data also include w,h for videos? And could that go into the METS?

tomcrane commented 4 years ago

(from standup, @jtweed)

It might be worth ensuring that these metadata are included in the METS, and then re-running things that don't have them. This would be preferable to the DDS inspecting media with ffprobe or whatever. METS is the authority.

I'll leave placeholders for these properties in the generated manifests, for now.

One little issue here though.

Existing AV has a synthetic poster image created by the migration code; need to ensure that re-run AV reacquires its original poster and expresses it in the expected way. That migrated METS was not produced by Goobi but by the Python migration code, re-assembling the METS and including the poster as part of the digital object, where it wasn't previously.

So AV dimensions and poster images should both be sorted before DDS reacquires them.

aray-wellcome commented 4 years ago

@tomcrane Here's the METS for a film with width and height included b32249731_wh.txt

tomcrane commented 4 years ago

We'll use this as a basis and assume that at some point, older videos will acquire this information.

Observation - some of these will still require parsing.

The MPG:

          <premis:significantPropertiesType>Duration</premis:significantPropertiesType>
          <premis:significantPropertiesValue>12mn 56s</premis:significantPropertiesValue>

          <premis:significantPropertiesType>ImageWidth</premis:significantPropertiesType>
          <premis:significantPropertiesValue>2048</premis:significantPropertiesValue>

          <premis:significantPropertiesType>ImageHeight</premis:significantPropertiesType>
          <premis:significantPropertiesValue>2048</premis:significantPropertiesValue>

The MXF:

          <premis:significantPropertiesType>Duration</premis:significantPropertiesType>
          <premis:significantPropertiesValue>12 min 56 s</premis:significantPropertiesValue>

          <premis:significantPropertiesType>ImageWidth</premis:significantPropertiesType>
          <premis:significantPropertiesValue>2 048 pixels</premis:significantPropertiesValue>

          <premis:significantPropertiesType>ImageHeight</premis:significantPropertiesType>
          <premis:significantPropertiesValue>2 048 pixels</premis:significantPropertiesValue>

Note different spacing, mn vs mns etc.

tomcrane commented 4 years ago

And just to confirm @aray-wellcome - the premis:significantPropertiesType witll be ImageWidth and ImageHeight even though they are videos (makes it easier as it's the same code as used for JP2s, although they are always integers, never strings).

HarkiranDhindsa commented 4 years ago

I'd noticed that the 2048 x 2048 dimensions displayed in that mets file don't look right. Our 2k scans from film reels are usually 2048 x 1536. Just checked the properties of the MP4 file for b32249731 and it is indeed 2048 x 1536. Ashley is raising that with Intranda. (Scans straight from VHS source are usually: 720 x 576)

aray-wellcome commented 4 years ago

@tomcrane About the parsing, Intranda says "We take the data from the id3 tag/exif metadata as it is. I think the only difference is that id3 tags is more or less well defined and contains normed metadata while the header for video files can contain anything. we do not convert data or create them, we re-use what is there"

So I assume because they're just taking the exif metadata and putting it in the METS, that's why the durations and w,h are all over the shop in terms of formatting.

I will need to ask if they can parse these so that width & height are just integers (with no weird spaces) and the duration to unix time.

If they can't, for some reason, will DDS be able to?

tomcrane commented 4 years ago

The DDS can do this, and it will be pretty easy to update its parsing logic in the face of new variant EXIF data. Ideally, the DDS doesn't do this, it just uses what it's given from METS as directly as possible, but if needs must...

What we're doing in the meantime is putting in some magic values that we can check for. I could just leave these at 0 but that might hide a slightly different problem:

if (videoSize != null)
{
    if (videoSize.Width <= 0 || videoSize.Height <= 0)
    {
        videoSize = new Size(999, 999);
    }

    canvas.Width = videoSize.Width;
    canvas.Height = videoSize.Height;
}
if (duration <= 0)
{
    duration = 999.99;
}
canvas.Duration = duration;

tomcrane commented 4 years ago

Linking this to #4788 as we should implement both in the same pass

wellcomecollection / platform

AV Technical Metadata #4777

Dealing with Slop