Closed mjordan closed 5 years ago
Once https://github.com/Islandora-Devops/migrate_7x_claw/pull/9 gets merged, the above code should look like:
<?php
// content type will need to be a Riprap admin option, as will limit (but
// apparently the JSON API's max is 50 items per page). In this example,
// we requrest page 2 with a size of 3 nodes.
$page_url = "http://localhost:8000/jsonapi/node/islandora_object?page[offset]=2&page[limit]=3";
$page_output = file_get_contents($page_url);
$page_output = json_decode($page_output, true);
// Taxonomy terms to check will need to be a Riprap admin option.
// "Original File" and "Preservation Master File"
$taxonomy_terms_to_check = array('/taxonomy/term/15', '/taxonomy/term/16');
// At this point, we have a list of 3 nodes.
foreach ($page_output['data'] as $node) {
$nid = $node['attributes']['nid'];
// Get the media associated with this node using the Islandora-supplied Manage Media View.
$media_url = "http://admin:islandora@localhost:8000/node/" . $nid . "/media?_format=json";
$media_data = file_get_contents($media_url);
$media_data = json_decode($media_data);
// Loop through all the media and pick the ones that
// are tagged with terms in $taxonomy_terms_to_check.
foreach ($media_data as $media) {
if (count($media->field_media_use)) {
foreach ($media->field_media_use as $term) {
if (in_array($term->url, $taxonomy_terms_to_check)) {
// @todo: Convert to the equivalent Fedora URL by querying Gemini
// using the value of $media->field_media_image[0]->target_uuid to get this type of response:
// {
// "drupal":"http:\/\/localhost:8000\/_flysystem\/fedora\/masters\/testing_12_OBJ.jpg",
// "fedora":"http:\/\/localhost:8080\/fcrepo\/rest\/masters\/testing_12_OBJ.jpg"
// }
// The Fedora URL is the one Riprap needs to validate the fixity of.
// @todo: Add option to not convert to Fedora URL if the site doesn't use Fedora.
// In that case, we need to figure out how to get Drupal's checksum for the file over HTTP.
}
}
}
}
}
According to https://www.drupal.org/docs/8/modules/jsonapi/sorting, we can:
We'll also need to include Basic auth credentials in Riprap for the JSON API and Views REST.
Work in the issue-14 branch can now parse out the Drupal URLs of images attached to nodes:
php bin/console app:riprap:check_fixity
string(57) "http://localhost:8000/_flysystem/fedora/testing_8_OBJ.jpg"
string(57) "http://localhost:8000/_flysystem/fedora/testing_7_OBJ.jpg"
string(57) "http://localhost:8000/_flysystem/fedora/testing_6_OBJ.jpg"
This comes from each media entity's field_media_image
field. We need to make sure that non-image files are also detected (i.e., what field do we use for non-image files?).
Non-image files are in field_media_file
.
Only thing not working is the authenticating against Gemini using a JWT token.
app:riprap:plugin:fetchresourcelist:from:drupal
plugin is complete, but I'm getting some strange behavior. When riprap hits the last page of a JSON:API request, it throws a curl error:
In CurlFactory.php line 186:
cURL error 3: <url> malformed (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
However, the URL triggering this error works as expected (200 response code) when requested using curl
on the command line, e.g., curl -v -uadmin:islandora "http://localhost:8000/jsonapi/node/islandora_object?page%5Blimit%5D=5&page%5Boffset%5D=10&sort=-changed"
.
OK, have tracked this down to an empty $media_list
on a node.
Closed with 342ba237f0448c82cbedf4b3d5ad78ac03697990.
Related to #6 and https://github.com/Islandora-CLAW/CLAW/issues/945.
We should have a fetchresourcelist plugin that queries Drupal for resources to check. The code below is a working proof of concept. It requires that the Drupal JSON API contrib module is enabled.
We will also need to persist the page number to request during the next scheduled job. This should probably go into a db table.