Closed jeremyf closed 1 year ago
Eric checked on our end, and he told me that the adl:periodical set contains the "collections" that the adl:issue works fall into. For this reason, I'm placing ticket scientist-softserv/adventist_knapsack#579 high on the priority list because based on my reading of his answer, I need to run a complete adl:periodical import before running adl:issue.
Here is the question I sent to Eric and the answer he gave me, for you to check my reasoning.
KATHARINE: ADL Bulkrax Import question: We ran an import on adl:periodical with a limit of 100 works. No works imported, and the importer just created a strangely-named and empty collection. I’ve asked SoftServ to look into the Hyku side of what happened here. But from our end, what do we expect from the adl:periodical set? What is and/or should be in that set? Is there a reason on our end why Bulkrax found no works?
ERIC: As far as the auto collection creation thing, the idea was that Bulkrax would create a periodical master collection the first time it saw one and then load the correct records into it. However doing the “is this a unique periodical” query each time a record hits the system was way too time consuming. So Rob turned off that check at one point. After the load was completed, he went back and wrote a script that automatically cleaned things up, grouping records by the name of their parent collections. Since under the hood each of what appears to be a duplicate collection has its own unique hyku id, Rob was able to automatically decide which collection would be the “keeper” and then move all the issues out of the “duplicate” collections into the one his code selected as the “keeper”. He then deleted all of the empty collections that were left. The reason you are not seeing one collection for each periodical issue is that the “duplicates” are created at the import batch level and not the item import level, if I recall correctly. As far as not importing works, I am seeing returns for both periodical related sets.
https://oai.adventistdigitallibrary.org/OAI-script?verb=ListIdentifiers&set=adl%3Aissue
https://oai.adventistdigitallibrary.org/OAI-script?verb=ListIdentifiers&set=adl%3Aperiodical
The idea was, not sure what SS wants now, to import the periodical set first and then import the issues. This was so that the periodical collection that holds the issues would get out metadata and not be one of those “auto created” collections.
This ticket passes Soft Serv QA.
I followed the testing instructions and created a new adl:periodical set importer on staging with a limit of 3 collections. The importer is done, but I don't see links to collections that I can test. The importer page only links one collection, and it is not a real collection (i.e. it isn't pulling from our OAI feed and it doesn't reflect any of the collection info I expect to see; it appears to be created by the importer and it is named for the set spec). Importer is here: https://adl.s2.adventistdigitallibrary.org/importers/32?locale=en
@KatharineV when in the importer page, click on the "Collection Entries"
Then click on a collection:
Finally, click on the "Collection Link: Collection"
All looks good on staging. I tested the collections in this import. One is stuck "pending," but @jeremyf knows about that collection. It got stuck before.
I ran a test import of 3 periodicals on production, and it completed. The raw metadata looks right with these exceptions:
Parent Collection (part_Of) for Pertandaan Zaman is still incorrectly showing The Southern Watchman, which doesn't match the part_Of raw metadata.
Fields with multiple values are not splitting at the semi-colon, so multiple values are displaying as a single metadata field. Example: Signs of the Times has multiple publishers and two subjects, but they failed to split.
These issues have been noted in other tickets, so I'm just restating it here to clarify that this test of 3 periodicals appears to work in general while continuing to have the specific problems that are being worked on overall. Thanks!
Clarification and update on my comment above:
I ran a test import of 3 periodicals on staging, and Pertandaan Zaman shows the correct textual metadata in the Part of field, but it is physically a part of the wrong Parent Collection.
@KatharineV I fixed the underlying issue regarding assigning a collection to another collection. One challenge is that my read of the logic is that each import will add to the existing relationship (e.g. the collection listing).
What I did was manually remove the relationships and then re-ran the importer. I believe the metadata for Pertandaan Zamaan shows the correct part of as well as collection relationship.
I created a periodical import on ADL staging to run through the testing instructions for this ticket, and it is stuck pending. No entries are showing up. The import was set to bring in the first 3 periodicals again, to confirm that they land in their proper collection relationships. If the importer doesn't move out of pending, I will not be able to test this ticket. I'll update here if things change, but for now I'm assuming something is wrong and I won't get to finish testing...
Edited to add that I also set up an importer on the SDAPI tenant staging environment, just to see if the tenant was the issue. The importers are stuck in both places.
Tried to test this ticket today (2/28/23) and the importer is stuck pending.
I went ahead and edited the importer via the UI. I clicked "Update and Re-Harvest All Items"; that appeared to unstick the importer.
Periodicals are importing to Staging as expected. A test import today (2023-03-13) completed without any problems.
Working as expected on production.
From https://docs.google.com/document/d/1mIOT23UAilSO77pAlXYSWJEHw3YK3BNNQVTKNzd41ao/edit#
Testing Criteria
oai_adl
adl:periodical
Note the counts on the parser will be off. See this PR for reasons
What does reviewing look look? With the changes of this commit, you should be able to see the raw metadata and parsed metadata for each imported collection. What we want to then see is that raw and parsed metadata on the imported collections.
What I Suspect will be the Raw Metadata
```