johnjung / bmrcportal

GNU General Public License v3.0
1 stars 1 forks source link

Repository information in Descriptive Summary #136

Open MomoMoses opened 2 years ago

MomoMoses commented 2 years ago

Again, in the UIC.AFSC finding aid, the info in the Descriptive Summary is: "Repository: University of Illinois at ChicagoSpecial Collections (Richard J. Daley Library) 801 S. Morgan"

Address lines aren't needed, since they appear on the left hand sidebar of the finding aid.

This data appears to have come from these elements: `

Records Summary University of Illinois at ChicagoSpecial Collections (Richard J. Daley Library) ` Is this correct? I now see that the Archives page in Wagtail only has one field for "Name" for each Archives, then the Address. If we include subareas like "Special Collections (Richard J. Daley Library)" in addition to the name of the university or parent organization, it gets a bit long when the Name field is used to populate other places in the Portal. What are all the places where the data in the Archives pages (name and address) are populated?
johnjung commented 2 years ago

This has something to do with https://github.com/johnjung/bmrcportal/issues/126. In that issue, you specified that we should test for a corpname after the repository element in the descriptive summary. If the corpname isn't there, we check to see if there is an address. Currently, the code assumes that the first line in an address is the name of the repository- so it is set to display that, and hide every addressline after the first. However, in the case of https://bmrc.lib.uchicago.edu/portal/view/?id=BMRC.UIC.AFSC.xml, the address starts with "801 S. Morgan." This is why 801 S. Morgan appears here.

If we implement https://github.com/johnjung/bmrcportal/issues/128, then this problem goes away and we can eliminate the code that hides every line in an address except for the first.

Data from Archive objects is currently used in two places. It appears at the top of an archive browse page, and in the sidebar of a finding aid view page. If we implement https://github.com/johnjung/bmrcportal/issues/128 the system will also display some of that data in the descriptive summary for each finding aid.

It's possible to add a subarea for Archive objects- but if we do that, please note that two subareas should not use the same finding aid prefix. So, for example, if you had Archive objects for "The University of Chicago - Special Collections" and "The University of Chicago - Preservation", you shouldn't use "BMRC.UOC" for both of them.

Please let me know what the next steps for this issue are. Do you want to add a subarea field to the Archive object?

MomoMoses commented 2 years ago

on Archives page, edit for each Archives would have an additional data field for a subarea or department (NAME needs to be finalized). Data in that field would be optional.

Update: working on crowdsourcing a good workable label.

MomoMoses commented 2 years ago

Changing the approach to OVERRIDE the data, not overwrite.

Use repo data from "Archives" page NOT repository element in the finding aid itself can do on the fly with the regularize script (other option is upon ingest to MarkLogic)

Before running the data through this revised transformation, it would behoove us to run a report as to what the repository elements contain. This should be re-run periodically, especially after a large batch has been loaded. It may indicate that a new Archives object should be created if indicated by a new subarea in the data.

MomoMoses commented 2 years ago

The consensus is to go with a catch-all kind of label/heading: "Department or Unit" for what we've been calling a subarea (which is what it's called in EAD). It should also indicate that it's optional, or at least, not required (no red asterisk). After reviewing the editing page for an Archives object, I agree with your approach of a minimal set of required data fields.

MomoMoses commented 2 years ago

Review of the data shows that many have at least 2 variants of the name in the <publisher> element from the filedesc section. UIC had 4 variants for 2 subareas; Northwestern has 4 subareas with 5 variants. One variant resulted from the organization changing its name. Not all repositories use this element.

This review also highlighted a handful of errors in the data: mismatch with filename indicator. Will fix.

repository element in the archdesc section has a lot more variation in practices. did not pursue further investigation--I feel that there is sufficient justification for using the Archives page in Wagtail as the authoritative and current information for each repository and subarea.

johnjung commented 2 years ago

I'm having trouble understanding what I can do to help with this issue- could you please summarize what steps need to take place to address this, and who is responsible for each step?

MomoMoses commented 2 years ago

Apologies - I see that there is discussion on this in issue #128 as you note. Here is the final word I think from that issue: "Use repo data from "Archives" page NOT <repository> element in the finding aid itself can do on the fly with the regularize script or upon ingest to MarkLogic"

I think in our first solution to this problem with the suppression of addresslines, etc., we assumed the data was less messy than it actually turns out to be. Ergo these two follow-on issues.

Here is the solution:

To populate display of the Archives/Repository name in the Descriptive Summary of the finding aid, use the data that BMRC has maintained in the "Archives" pages on Wagtail. OVERRIDE, but do not change the data in the encoded document. As to whether this should be done on the fly with the "regularize" script or more permanently on ingest to MarkLogic, my concern is with performance hits, and optimizing speedy display.

MomoMoses commented 2 years ago

The temporary fix in #126 was not effective. Note to check that we have at least some repository information showing up where it needs to be even if it contains weird address info, etc.