mjordan / riprap

A PREMIS-compliant fixity checking microservice.
MIT License
13 stars 7 forks source link

Check all listed content types #53

Closed mjordan closed 4 years ago

mjordan commented 5 years ago

@seth-shaw-unlv reports in #48 that PluginFetchResourceListFromDrupal "only checks the first content type in the content type listing. So, even if I have multiple content types listed in my settings file only the first one gets checked."

mjordan commented 5 years ago

@seth-shaw-unlv the only way I can get my head around this is to wrap the population of the list of nodes to check during a given job ($whole_node_list['data']) within a foreach() loop that iterates over all configured content types, e.g.:

        $whole_node_list = array('data' => array());
        foreach ($this->drupal_content_types as $drupal_content_type) {
            // Issue 53: ^^^ This foreach loop is new.
            // Issue 53: vvv The code within the loop remains the same.
            for ($p = 1; $p <= $this->num_jsonapi_pages_per_run; $p++) {
                $client = new \GuzzleHttp\Client();
                $url = $this->drupal_base_url . '/jsonapi/node/' . $drupal_content_type;
                $response = $client->request('GET', $url, [
                    'http_errors' => false,
                    'headers' => [
                        'Accept' => 'application/vnd.api+json',
                        $this->jsonapi_authorization_headers[0]
                    ],
                    'query' => ['page[offset]' => $page_offset, 'page[limit]' => $this->page_size, 'sort' => '-changed']
                ]);

                $status_code = $response->getStatusCode();
                $node_list_from_jsonapi_json = (string) $response->getBody();
                $node_list_from_jsonapi = json_decode($node_list_from_jsonapi_json, true);

                if ($status_code === 200) {
                    $whole_node_list['data'] = array_merge($whole_node_list['data'], $node_list_from_jsonapi['data']);
                    $this->setPageOffset($page_offset, $node_list_from_jsonapi['links']);
                }
            }
        }

The effect of this approach will be that the actual number of nodes that end up in $whole_node_list['data'] would grow proportionally to the number of configured Islandora content types. In that case, we should provide advice to reduce the number of pages checked during each job to account for that new loop. For example, if you have two content types configured, the number of nodes that could end up in $whole_node_list['data'] would double; if you have three content types configured, it would triple. So our rule-of-thumb advice might be, in general, if you have 3 Islandora content types, reduce the number of pages per run by 1/3.

Does that sound like a viable approach?

seth-shaw-unlv commented 5 years ago

That makes sense, although isn't is possible that the two content types would have different page offsets? We would need to store separate offsets for each content type.

mjordan commented 5 years ago

Good point. I'll give it a try, but maybe not until on the plane to OpenRepositories ✈️

mjordan commented 4 years ago

Just discovered the https://www.drupal.org/project/jsonapi_search_api module (blog post at https://www.centarro.io/blog/querying-drupal-search-api-indexes-using-jsonapi). Might we worth checking out.

mjordan commented 4 years ago

@seth-shaw-unlv now that Islandora Riprap provides a ready-to-use View for listing media to check, I'd like to deprecate using JSON:API to fetch the resource list. The View is much more flexible. Any objections?

seth-shaw-unlv commented 4 years ago

@mjordan, fine by me.

mjordan commented 4 years ago

OK, thanks, I'll close this and open an issue to deprecate the JSON:API plugin.