Creat a fetchresourcelist plugin that queries Drupal for media to check

mjordan commented 6 years ago

We should have a fetchresourcelist plugin that queries Drupal for resources to check. The code below is a working proof of concept. It requires that the Drupal JSON API contrib module is enabled.

<?php
// content type will need to be a Riprap admin option, as will limit (but
// apparently the JSON API's max is 50 items per page). In this example, 
// we requrest page 2 with a size of 3 nodes.
$page_url = "http://localhost:8000/jsonapi/node/islandora_object?page[offset]=2&page[limit]=3";

$page_output = file_get_contents($page_url);
$page_output = json_decode($page_output, true);

// Taxonomy terms to check will need to be a Riprap admin option.
$taxonomy_terms_to_check = array('/taxonomy/term/2'); // "Preservation master"

// At this point, we have a list of 3 nodes.
foreach ($page_output['data'] as $node) {
  $nid = $node['attributes']['nid']; 
  // Get the media associated with this node using the Islandora-supplied Manage Media View.
  $media_url = "http://admin:islandora@localhost:8000/node/" . $nid . "/media?_format=json";
  $media_data = file_get_contents($media_url);
  $media_data = json_decode($media_data);
  // Loop through all the media and pick the ones that
  // are tagged with terms in $taxonomy_terms_to_check.
  foreach ($media_data as $media) {
    if (count($media->field_tags)) {
      foreach ($media->field_tags as $term) {
        if (in_array($term->url, $taxonomy_terms_to_check)) {
          // @todo: Convert to the equivalent Fedora URL and add to the plugin's output.
          // @todo: Add option to not convert to Fedora URL if the site doesn't use Fedora.
          // In that case, we need to figure out how to get Drupal's checksum for the file over HTTP.
          var_dump($media->field_media_image[0]->url);
        }
      }
    }
  }
}

We will also need to persist the page number to request during the next scheduled job. This should probably go into a db table.

mjordan commented 6 years ago

Once https://github.com/Islandora-Devops/migrate_7x_claw/pull/9 gets merged, the above code should look like:

<?php
// content type will need to be a Riprap admin option, as will limit (but
// apparently the JSON API's max is 50 items per page). In this example, 
// we requrest page 2 with a size of 3 nodes.
$page_url = "http://localhost:8000/jsonapi/node/islandora_object?page[offset]=2&page[limit]=3";

$page_output = file_get_contents($page_url);
$page_output = json_decode($page_output, true);

// Taxonomy terms to check will need to be a Riprap admin option.
// "Original File" and "Preservation Master File"
$taxonomy_terms_to_check = array('/taxonomy/term/15', '/taxonomy/term/16');

// At this point, we have a list of 3 nodes.
foreach ($page_output['data'] as $node) {
  $nid = $node['attributes']['nid']; 
  // Get the media associated with this node using the Islandora-supplied Manage Media View.
  $media_url = "http://admin:islandora@localhost:8000/node/" . $nid . "/media?_format=json";
  $media_data = file_get_contents($media_url);
  $media_data = json_decode($media_data);
  // Loop through all the media and pick the ones that
  // are tagged with terms in $taxonomy_terms_to_check.
  foreach ($media_data as $media) {
    if (count($media->field_media_use)) {
      foreach ($media->field_media_use as $term) {
        if (in_array($term->url, $taxonomy_terms_to_check)) {
          // @todo: Convert to the equivalent Fedora URL by querying Gemini
          // using the value of $media->field_media_image[0]->target_uuid to get this type of response:
          // {
          //  "drupal":"http:\/\/localhost:8000\/_flysystem\/fedora\/masters\/testing_12_OBJ.jpg",
          //  "fedora":"http:\/\/localhost:8080\/fcrepo\/rest\/masters\/testing_12_OBJ.jpg"
          // }
          // The Fedora URL is the one Riprap needs to validate the fixity of.
          // @todo: Add option to not convert to Fedora URL if the site doesn't use Fedora.
          // In that case, we need to figure out how to get Drupal's checksum for the file over HTTP.
        }
      }
    }
  }
}