ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

OAI Testing #1712

Open mbarnett opened 4 years ago

mbarnett commented 4 years ago

Once Data has been migrated on the new Staging environment, we'll need to test the new OAI implementation to find any errors that may appear with Production-quality data.

Steps

ConnorSheremeta commented 4 years ago

https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_etdms

gives a 500. logs:

I, [2020-08-04T11:30:51.030587 #9771]  INFO -- : [87bb55b7-a89a-4d15-b444-914611409724]   Rendered vendor/ruby/2.5.0/bundler/gems/oaisys-d0e3f515472b/app/views/oaisys/pmh/list_records.xml.builder within layouts/oaisys/application (Duration: 1690.9ms | Allocations: 243817)
I, [2020-08-04T11:30:51.030638 #9771]  INFO -- : [87bb55b7-a89a-4d15-b444-914611409724]   Rendered vendor/ruby/2.5.0/bundler/gems/oaisys-d0e3f515472b/app/views/layouts/oaisys/application.builder (Duration: 1691.1ms | Allocations: 243946)
I, [2020-08-04T11:30:51.030864 #9771]  INFO -- : [87bb55b7-a89a-4d15-b444-914611409724] Completed 500 Internal Server Error in 2601ms (ActiveRecord: 2297.4ms | Allocations: 251064)
F, [2020-08-04T11:30:51.032026 #9771] FATAL -- : [87bb55b7-a89a-4d15-b444-914611409724]   
[87bb55b7-a89a-4d15-b444-914611409724] ActionView::Template::Error (undefined method `first' for nil:NilClass):
[87bb55b7-a89a-4d15-b444-914611409724]     30:         item.member_of_paths.each { |path| header.setSpec path.tr('/', ':') }
[87bb55b7-a89a-4d15-b444-914611409724]     31:       end
[87bb55b7-a89a-4d15-b444-914611409724]     32:       record.metadata do |metadata_xml|
[87bb55b7-a89a-4d15-b444-914611409724]     33:         item.serialize_metadata(format: metadata_format, into_document: metadata_xml)
[87bb55b7-a89a-4d15-b444-914611409724]     34:       end
[87bb55b7-a89a-4d15-b444-914611409724]     35:     end
[87bb55b7-a89a-4d15-b444-914611409724]     36:   end
[87bb55b7-a89a-4d15-b444-914611409724]   
[87bb55b7-a89a-4d15-b444-914611409724] app/decorators/metadata/oai_etdms/thesis_decorator.rb:47:in `discipline'

cannot assume object.departments.first is valid with prod data


We're missing the publisher field on each thesis... from https://www.bac-lac.gc.ca/eng/services/theses/Pages/universities.aspx:

Publisher Mandatory The full name of the university that granted the degree. If possible, hard-code or standardize the field to prevent errors and variations in the university name.Simon Fraser UniversityUniversité de Montréal</dc.publisher>

some records, like the one below, are missing both the degree name and grantor:

  <record>
    <header>
      <identifier>oai:era.library.ualberta.ca:5a85be9d-ebef-4f6d-81bf-20503f563bbd</identifier>
      <datestamp>2020-07-20 22:46:28 UTC</datestamp>
      <setSpec>db9a4e71-f809-4385-a274-048f28eb6814:f42f3da6-00c3-4581-b785-63725c33c7ce</setSpec>
    </header>
    <metadata>
      <etd_ms:thesis xmlns:etd_ms="http://www.ndltd.org/standards/metadata/etdms/1.0/" xmlns:xsi2="http://www.w3.org/2001/XMLSchema-instance" xsi2:schemaLocation="http://www.ndltd.org/standards/metadata/etdms/1.0/ http://www.ndltd.org/standards/metadata/etdms/1-0/etdms.xsd">
        <etd_ms:title>Induction of alcohol dehydrogenase, lactate dehydrogenase, and alanine aminotransferase gene expression of Arabidopsis thatliana exposed to hypoxia</etd_ms:title>
        <etd_ms:creator>Spryland, Kathleen Anne.</etd_ms:creator>
        <etd_ms:date>2020-07-20 22:46:28 UTC</etd_ms:date>
        <etd_ms:type>Thesis</etd_ms:type>
        <etd_ms:identifier>https://era-test.library.ualberta.ca/items/5a85be9d-ebef-4f6d-81bf-20503f563bbd</etd_ms:identifier>
        <etd_ms:identifier>doi:10.7939/R3TP9M</etd_ms:identifier>
        <etd_ms:identifier>https://era-test.library.ualberta.ca/items/5a85be9d-ebef-4f6d-81bf-20503f563bbd/view/82616ba5-cf77-444e-a7b2-b7bf3e8269ae/MQ21209.pdf</etd_ms:identifier>
        <etd_ms:language>English</etd_ms:language>
        <etd_ms:rights>This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.</etd_ms:rights>
      </etd_ms:thesis>
    </metadata>
  </record>

from https://www.bac-lac.gc.ca/eng/services/theses/Pages/universities.aspx:

Degree name Mandatory Name of the degree associated with the thesis. Abbreviations are preferred, and abbreviated parts consisting of more than a single letter should be separated by a space from the preceding or succeeding words or initials.Ph. D.M.E.S.
Degree grantor Mandatory Name of the institution that awarded the degree. Use the name of the university from the time the degree was granted.University of Winnipeg

Results from the perl validator:

# RUNNING VALIDATION FOR https://era-app-stg-1.library.ualberta.ca/oai

### Checking Identify response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=Identify GET
PASS:    Administrator email address is 'eraadmi@ualberta.ca'
PASS:    Correctly reports OAI-PMH protocol version 2.0
FAIL:    baseURL supplied 'https://era-app-stg-1.library.ualberta.ca/oai' does not match the baseURL in the Identify response 'https://era.library.ualberta.ca/oai'. The baseURL you enter must EXACTLY match the baseURL returned in the Identify response. It must match in case (http://Wibble.org/ does not match http://wibble.org/) and include any trailing slashes etc.
PASS:    Datestamp granularity is 'seconds'
PASS:    Extracted earliestDatestamp 2018-06-22T13:06:50Z

### Checking ListSets response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListSets GET
PASS:    responseDate has correct format: 2020-08-07T19:28:36Z
PASS:    Extracted 150 set names: { b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b f7766168-d234-491a-b27c-2c4a5eecbc99:d7e84f98-9931-435b-89fd-80f713d5ca47 ... }, will use setSpec &set=b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 in tests

### Checking ListIdentifiers response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 GET
PASS:    responseDate has correct format: 2020-08-07T19:28:37Z
NOTE:    Tried empty set, will look for set with items...
NOTE:    Trying set &set=560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b
REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b GET
PASS:    responseDate has correct format: 2020-08-07T19:28:37Z
PASS:    Good ListIdentifiers response, extracted id 'oai:era.library.ualberta.ca:5a0aad85-bffa-4686-bae3-d589a64361dc' for use in future tests.

### Checking ListMetadataFormats response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListMetadataFormats&identifier=oai%3Aera%2Elibrary%2Eualberta%2Eca%3A5a0aad85-bffa-4686-bae3-d589a64361dc GET
PASS:    responseDate has correct format: 2020-08-07T19:28:37Z
PASS:    Good ListMetadataFormats response, includes oai_dc
PASS:    Data provider supports oai_dc metadataPrefix

### Checking GetRecord response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&identifier=oai%3Aera%2Elibrary%2Eualberta%2Eca%3A5a0aad85-bffa-4686-bae3-d589a64361dc&metadataPrefix=oai_dc GET
FAIL:    Server failed to respond to the GetRecord request (HTTP header values: status=404 Not Found, age=0, lifetime=0, is fresh:=)
FAIL:    Can't complete datestamp check for GetRecord
FAIL:    ABORT: Can't complete datestamp check for GetRecord

oops, validation didn't run to completion: ABORT: Can't complete datestamp check for GetRecord

## Validation status of data provider https://era-app-stg-1.library.ualberta.ca/oai is FAILED

Failures:

FAIL:    baseURL supplied 'https://era-app-stg-1.library.ualberta.ca/oai' does not match the baseURL in the Identify response 'https://era.library.ualberta.ca/oai'. The baseURL you enter must EXACTLY match the baseURL returned in the Identify response. It must match in case (http://Wibble.org/ does not match http://wibble.org/) and include any trailing slashes etc.

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&identifier=oai%3Aera%2Elibrary%2Eualberta%2Eca%3A5a0aad85-bffa-4686-bae3-d589a64361dc&metadataPrefix=oai_dc GET
FAIL:    Server failed to respond to the GetRecord request (HTTP header values: status=404 Not Found, age=0, lifetime=0, is fresh:=)
FAIL:    Can't complete datestamp check for GetRecord
FAIL:    ABORT: Can't complete datestamp check for GetRecord

First one will be resolved once it's on prod and the rest are due to get record not finding a matching record with that identifier due to the prefixed oai:era.library.ualberta.ca: on the identifiers.


The script which went through all pages of list records for items succeeded.

The script which went through all pages of list records for theses succeed once the departments/discipline issue was fixed. The only issues found were the missing degree name/grantor/publisher fields.

ConnorSheremeta commented 4 years ago

Summary of findings above:

GetRecord

Get record does not work fully right now. Issue: Cant find a matching record with that identifier due to the prefixed oai:era.library.ualberta.ca: on the identifiers. It works when the identifier is just the uuid.

Identify

Identify is working fully.

ListIdentifiers

List Identifiers is working fully.

ListMetadataFormats

List metadata formats is working fully.

ListRecords

List records works for the metadata prefix oai_dc but not oai_etdms. Issues with oai_etdms:

ConnorSheremeta commented 4 years ago

Perl validator ddin't run fully previously and there were tests that didnt pass that weren't run before. Two major issues were fixed where the date in get record and list records was not in the proper format. This is the result of the test after those were fixed:

RUNNING VALIDATION FOR https://era-app-stg-1.library.ualberta.ca/oai

Checking Identify response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=Identify GET PASS: Administrator email address is 'eraadmi@ualberta.ca' PASS: Correctly reports OAI-PMH protocol version 2.0 FAIL: baseURL supplied 'https://era-app-stg-1.library.ualberta.ca/oai' does not match the baseURL in the Identify response 'https://era.library.ualberta.ca/oai'. The baseURL you enter must EXACTLY match the baseURL returned in the Identify response. It must match in case (http://Wibble.org/ does not match http://wibble.org/) and include any trailing slashes etc. PASS: Datestamp granularity is 'seconds' PASS: Extracted earliestDatestamp 2018-06-22T13:06:50Z

Checking ListSets response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListSets GET PASS: responseDate has correct format: 2020-09-21T16:28:39Z PASS: Extracted 150 set names: { b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b f7766168-d234-491a-b27c-2c4a5eecbc99:d7e84f98-9931-435b-89fd-80f713d5ca47 ... }, will use setSpec &set=b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 in tests

Checking ListIdentifiers response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=b41cdbfd-6af2-4a13-8ba6-59725565d445:adbab43d-b35d-4493-a3a5-bd228863cc36 GET PASS: responseDate has correct format: 2020-09-21T16:28:40Z NOTE: Tried empty set, will look for set with items... NOTE: Trying set &set=560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=560f321f-a8c7-4884-adb3-326433a61688:17bd1d5d-7d41-40ed-8ea2-97c2ec63896b GET PASS: responseDate has correct format: 2020-09-21T16:28:41Z PASS: Good ListIdentifiers response, extracted id 'oai:era.library.ualberta.ca:2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e' for use in future tests.

Checking ListMetadataFormats response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListMetadataFormats&identifier=oai%3Aera%2Elibrary%2Eualberta%2Eca%3A2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e GET PASS: responseDate has correct format: 2020-09-21T16:28:42Z PASS: Good ListMetadataFormats response, includes oai_dc PASS: Data provider supports oai_dc metadataPrefix

Checking GetRecord response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&identifier=oai%3Aera%2Elibrary%2Eualberta%2Eca%3A2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e&metadataPrefix=oai_dc GET PASS: responseDate has correct format: 2020-09-21T16:28:43Z PASS: Datestamp in GetRecord response (2020-09-03T19:06:35Z) has the correct form for seconds granularity. PASS: Datestamp in GetRecord response (2020-09-03T19:06:35Z) matched the seconds granularity specified in the Identify response. PASS: Expected setSpec was returned in the response

Checking ListRecords response

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&from=2020-09-03T19:06:35Z&until=2020-09-03T19:06:35Z&metadataPrefix=oai_dc GET PASS: responseDate has correct format: 2020-09-21T16:28:44Z PASS: Response is well formed PASS: ListRecords response correctly included record with identifier oai:era.library.ualberta.ca:2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e

Checking exception handling (errors)

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?junk GET PASS: Error response correctly includes error code 'badVerb' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=junk GET PASS: Error response correctly includes error code 'badVerb' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&metadataPrefix=oai_dc GET PASS: Error response correctly includes error code 'badArgument' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&identifier=oai:era.library.ualberta.ca:2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e GET PASS: Error response correctly includes error code 'badArgument' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=GetRecord&identifier=invalid"id&metadataPrefix=oai_dc GET PASS: Error response correctly includes error code 'idDoesNotExist' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&until=junk GET PASS: Error response correctly includes error code 'badArgument' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&from=junk GET PASS: Error response correctly includes error code 'badArgument' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListIdentifiers&resumptionToken=junk&until=2000-02-05 GET PASS: Error response correctly includes error code 'badResumptionToken' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc&from=junk GET WARN: Bad HTTP status code from server: 500 FAIL: Can't parse malformed response. REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&resumptionToken=junk GET PASS: Error response correctly includes error code 'badResumptionToken' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc&resumptionToken=junk&until=1990-01-10 GET PASS: Error response correctly includes error code 'badResumptionToken' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc&until=junk GET FAIL: Exception/error response did not contain error code 'badArgument' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords GET PASS: Error response correctly includes error code 'badArgument' WARN: Only 11 out of 13 error requests properly handled

Checking for version 2.0 specific exceptions

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2002-02-05&until=2002-02-06T05:35:00Z GET FAIL: Error code badArgument not found in response but should be given to the request: verb=ListRecords&metadataPrefix=oai_dc&from=2002-02-05&until=2002-02-06T05:35:00Z The request has different granularities for the from and until parameters. REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc&until=2017-06-22T13:06:50Z GET PASS: Error response correctly includes error code 'noRecordsMatch'

Checking that HTTP POST requests are handled correctly

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai POST verb:Identify FAIL: POST test 1 was unsuccessful. Server returned HTTP Status: '422 Unprocessable Entity' REQUEST: https://era-app-stg-1.library.ualberta.ca/oai POST identifier:oai:era.library.ualberta.ca:2cb91317-d9e3-4b6a-b3a1-7f96aa812a6e metadataPrefix:oai_dc verb:GetRecord FAIL: POST test 2 was unsuccessful. Server returned HTTP Status: '422 Unprocessable Entity'

Checking for correct use of resumptionToken (if used)

REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&metadataPrefix=oai_dc GET NOTE: Got resumptionToken vOT6kOeC8cqfv1zljKAHbj REQUEST: https://era-app-stg-1.library.ualberta.ca/oai?verb=ListRecords&resumptionToken=vOT6kOeC8cqfv1zljKAHbj GET PASS: Resumption tokens appear to work

Validation status of data provider https://era-app-stg-1.library.ualberta.ca/oai is FAILED

Those tests that are failing are edge cases and can probably be put off for now. They should probably be addressed at some point though.

I believe it is ready for metadata to verify the OAI outpuit.

mbarnett commented 4 years ago

Looks good. I agree that those look like error-handling edge cases that we can put off for now, but maybe just open a ticket and record them there so that we don't lose track