ualbertalib / avalon

University of Alberta's Media Repository based on Avalon
Apache License 2.0
2 stars 2 forks source link

Content export to allow creation of a bundle for migration to aviary #753

Closed jefferya closed 2 years ago

jefferya commented 3 years ago

Script to create an export package using a Ruby rake task

Input:

Proposed contents of export

Proposed structure:

MediaObject.id
├── media_files
│   ├── MasterFile.id
│   │   ├── original_media_file
│   │   │   └── original_media_file_name
│   │   ├── captions
│   │   │   └── caption_file.srt
│   │   ├── derivatives
│   │   │   ├── high_quality_derivative_id
│   │   │   │   ├── derivative.json
│   │   │   │   └── original_filename.mp4
│   │   │   ├── low__quality_derivative_id
│   │   │   │   ├── derivative.json
│   │   │   │   └── original_filename.mp4
│   │   │   └── medium__quality_derivative_id
│   │   │       ├── derivative.json
│   │   │       └── original_filename.mp4
│   │   ├── media_file.json
│   │   ├── poster
│   │   │   └── poster.jpg
│   │   ├── structural_metadata.xml
│   │   └── thumbnail
│   │       └── thumbnail.jpg
├── media_object.json
└── mods.xml

Example

x059c7329
├── media_files
│   ├── bc386j20b
│   │   ├── original_media_file
│   │   │   └── bc386j20b-big_buck_bunny.mp4
│   │   ├── captions
│   │   │   └── 4-1-Synthese-et-evaluation-de-l'information.srt
│   │   ├── derivatives
│   │   │   ├── high_fd8530a2-f2b0-4cca-baf7-049005125d2d
│   │   │   │   ├── derivative.json
│   │   │   │   └── original_filename.mp4
│   │   │   ├── low_f34667c4-ef52-4c82-895c-eadfd3287112
│   │   │   │   ├── derivative.json
│   │   │   │   └── original_filename.mp4
│   │   │   └── medium_d4c979dc-b712-49c8-88fa-b32d8db11ace
│   │   │       ├── derivative.json
│   │   │       └── original_filename.mp4
│   │   ├── media_file.json
│   │   ├── poster
│   │   │   └── poster.jpg
│   │   ├── structural_metadata.xml
│   │   └── thumbnail
│   │       └── thumbnail.jpg
├── media_object.json
└── mods.xml

Added collection object export and an initial prototype directory structure to use as a mechanism to gather feedback. The structure looks like:

${collection_id}
├── collection_object.json
├── MediaObjects
│   ├── one or more directories representing media objects contained in the collection

Logging:

If the master file associated with a media object cannot be found, the event is logged, for example:

ERROR: [2227mq35x] media file not found: [/srv/avalon/dropbox/ConvocationHall/19810331StageBand.wav] for Object [bz60cw74z]

If a master file associated with a media object is not found in the location stored with anAvalon/Fedora property but can be found at an alternative directory within the dropbox filesystem, the event is logged, for example:

INFO: [1g05fc252] stored media file path outdated [/srv/avalon/dropbox/ConvocationHall/20140314Gervais-1-1.aiff] -- using alternative [/srv/avalon/dropbox//ConvocationHall/Processed/Cart6/20140314Gervais-1-1.aiff]

References:

Working notes:

Prototype:

Execution:

Questions:

  1. Should the order of the media files attached to the media object be retained (in the directory naming not just the media_object.json file)?

    • note: the media_object.json file contains ordering information via the root level files key based on the order of items in the array (if one changes the order of media files in the Avalon edit structure page, the resulting JSON ordering of the files key is changed 2021-06-10). Note: there are 2 files keys at different levels of the JSON structure, one for media files and one at a deeper level for derivatives attached to a media file.
  2. An attempt is made to locate the original media file if the dropbox directory structure is changed. This attempt fails on files uploaded via the UI. Also fails if the dropbox originals were renamed.

  3. Where to store extractions?

    • response: initially /srv/avalon/dropbox/ in a subdirectory for example __export where others have sftp/scp/rsync access
  4. Enhancement: post run check to verify derivatives located.

Notes:

jefferya commented 3 years ago

Added collection object export and an initial prototype directory structure to use as a mechanism to gather feedback. The structure looks like:

${collection_id}
├── collection_object.json
├── MediaObjects
│   ├── one or more directories representing media objects contained in the collection
jefferya commented 2 years ago

@anayram @seanluyk

How the ordered sequence of files is recorded in the export package.

The gist of the following, the JSON output of the media object contains an ordered list of files attached to the media object (i.e., the first file in the JSON output will be the first file in UI, the second file in the JSON will be the second in the UI...). If the order changes in the UI, the order change will be reflected in the JSON output by the API endpoint.

The details:

Using the object s7526c93c as an example.

Screenshot from 2021-11-19 10-57-01

The JSON output via the API s7526c93c.json mirrors what the export package contains in the media_object.json file. This output JSON contains a top-level key, files, that contains an ordered list of sections with each representing the metadata about a file (i.e., masterfile) attached to the media object. Note: a warning that the top-level files key is different from the files describing the transcodings for an individual masterfile.

The annotated JSON for illustration (.... means I'm skipping some parts):

{
  "id": "s7526c93c",
  "title": "\"Four for 4\" Four graduate student compositions for saxophone quartet",
  "collection": "Convocation Hall Digital Archive",
  ...
  "read_groups": [
    "registered"
  ],
  "files": [

The list of files attached to the media object begins

    {
      "id": "mp48sd459",
      ...
      "label": "Three Divergent Components: I. Four Stand Giants",
      ...
      "files": [
        {
          "label": "quality-medium",
          ...
        }
        ...
      ]
    },    

The first masterfile attached to the media object.

   {
      "id": "5712m714x",
      ...
      "label": "II. Unison",
      ...
    },
    {
      "id": "3r074v73z",
      ...
      "label": "III. Engine Trouble",
      ...
    },
    {
      "id": "5712m750s",
      ...
      "label": "In Nomine Innominabillis",
      ...
    },
    {
      "id": "hq37vp43g",
      ...
      "label": "In Translation",
      ...
    },
    {
      "id": "gq67jr811",
      ...
      "label": "Fracture 170",
      ...
    }
  ],

These are the next four masterfiles. If a change is made in the UI to the order of masterfile then the change in order will be reflected in the above section within the files key.

  "fields": {
  ...
  }
}

Next is a listing of descriptive metadata fields before reaching the end of the JSON.