johnjung / bmrcportal

GNU General Public License v3.0
1 stars 1 forks source link

Two Labels for extent element, one oversized! #137

Closed MomoMoses closed 2 years ago

MomoMoses commented 2 years ago

at least one data display anomaly -- UIC. Descriptive Summary stuff again. AFSC collection - weirdly large "Quantity" That's the label attribute for the <extent> element. Both "Size" and "Quantity" are shown, but my understanding was that the label specified in the data would only be shown. If no data, then "Size" would be used.

Originally posted by @MomoMoses in https://github.com/johnjung/bmrcportal/issues/133#issuecomment-1058446234

MomoMoses commented 2 years ago

physdesc element - can contain extent, but doesn't always. regularize adds labels if elements don't have them, but it assumes that elements won't be nested. needs to check for descendants with labels.

MomoMoses commented 2 years ago

possible to have additional transforms specifically for one institution to accommodate idiosyncratic practices.

johnjung commented 2 years ago

Here is a report on the different configurations of descriptive summary physdesc elements:

Group 1) 211 matches- e.g. BMRC.HARSH.MINOR_FRANCES.xml physdesc HAS a label. It contains a text node.

Group 2) 50 matches- e.g. BMRC.NU.UNIV_THEATRE_PHOTO.xml physdesc HAS a label. It contains no direct child text nodes, but it does have one extent WITHOUT a label attribute.

Group 3) 21 matches- e.g. BMRC.NU.CITIZENS_FOR65.xml physdesc HAS a label. It contains no direct child text nodes, but it does have more than one extent, each WITHOUT a label attribute.

Group 4) 1 match- e.g. BMRC.CHM.BLACK-TIMUEL.xml physdesc has NO label. It contains a text node.

Group 5) 1176 matches- e.g. BMRC.EHC.CR_HRC.xml physdesc has NO label attribute. It contains no direct child text nodes, but it does have one extent, WITHOUT a label attribute.

Group 6) 64 matches- e.g. BMRC.UIC.ZBIRAL-TELLER_PHOTOS.xml physdesc has NO label, but a single extent HAS a label.

Group 7) 94 matches- e.g. BMRC.NU.NUBAA.xml physdesc has NO label. It contains no direct child text nodes, but it does have more than one extent, each WITHOUT a label attribute.

I added templates to regularize.xsl that do the following:

A. If a physdesc has a label, use that. (Covers groups 1, 2, 3) B. If a physdesc has no label, but it contains an extent with a label, use that. (Covers group 6) C. If a physdesc has no label, and it contains no extents with labels, use a standard label. (Covers groups 4, 5, 7)

Note that currently, there is never a case where a physdesc element contains an extent, and both have labels- so there is never a case where we have to decide which label to use.

This is now fixed in https://github.com/uchicago-library/bmrc/commit/aa6f46e6d0f12dbb0d886e0a4abd83315bf39a51 and live on the server. Please take a look and let me know if this works- if so, please feel free to close this issue out.

MomoMoses commented 2 years ago

Dang, you're good!

On Fri, Mar 4, 2022, 2:49 PM John Jung @.***> wrote:

Here is a report on the different configurations of descriptive summary physdesc elements:

Group 1) 211 matches- e.g. BMRC.HARSH.MINOR_FRANCES.xml physdesc HAS a label. It contains a text node.

Group 2) 50 matches- e.g. BMRC.NU.UNIV_THEATRE_PHOTO.xml physdesc HAS a label. It contains no direct child text nodes, but it does have one extent WITHOUT a label attribute.

Group 3) 21 matches- e.g. BMRC.NU.CITIZENS_FOR65.xml physdesc HAS a label. It contains no direct child text nodes, but it does have more than one extent, each WITHOUT a label attribute.

Group 4) 1 match- e.g. BMRC.CHM.BLACK-TIMUEL.xml physdesc has NO label. It contains a text node.

Group 5) 1176 matches- e.g. BMRC.EHC.CR_HRC.xml physdesc has NO label attribute. It contains no direct child text nodes, but it does have one extent, WITHOUT a label attribute.

Group 6) 64 matches- e.g. BMRC.UIC.ZBIRAL-TELLER_PHOTOS.xml physdesc has NO label, but a single extent HAS a label.

Group 7) 94 matches- e.g. BMRC.NU.NUBAA.xml physdesc has NO label. It contains no direct child text nodes, but it does have more than one extent, each WITHOUT a label attribute.

I added templates to regularize.xsl that do the following:

A. If a physdesc has a label, use that. (Covers groups 1, 2, 3) B. If a physdesc has no label, but it contains an extent with a label, use that. (Covers group 6) C. If a physdesc has no label, and it contains no extents with labels, use a standard label. (Covers groups 4, 5, 7)

Note that currently, there is never a case where a physdesc element contains an extent, and both have labels- so there is never a case where we have to decide which label to use.

This is now fixed in @.*** https://github.com/uchicago-library/bmrc/commit/aa6f46e6d0f12dbb0d886e0a4abd83315bf39a51 and live on the server. Please take a look and let me know if this works- if so, please feel free to close this issue out.

— Reply to this email directly, view it on GitHub https://github.com/johnjung/bmrcportal/issues/137#issuecomment-1059520088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA263WWLHE7WSYAOSPAOJPTU6JZL3ANCNFSM5P3SQH4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

MomoMoses commented 2 years ago

BMRC.HARSH.MINOR_FRANCES.xml (Group 1) physdesc HAS a label. It's "Size." This works great on the finding aid display. Unfortunately, in the SERP display the label "Extent" has no information; it's blank. Which is odd, because as we can see, there IS information. It's just not contained in an <extent> element.

Same result for BMRC.UOC.ACLU-IL.xml (Group 1). Blank on SERP page. Found another example that looks fine on the finding aid display but has a blank for Extent. BMRC.UOC.HP-HISTORICAL.xml

MomoMoses commented 2 years ago

Group 2) 50 matches- e.g. BMRC.NU.UNIV_THEATRE_PHOTO.xml Physical Description: 22.00 in the above example, is not great. 22 of what? It's given in the "type" attribute of the <extent> element. It should be possible to use that to add to the text node shouldn't it? It's only 50 examples, however, so if time runs out, it's possible to manually edit. Best to have an automatic solution if possible for posterity.

MomoMoses commented 2 years ago

Group 4 CHM.BLACK-TIMUEL actually has two physdesc elements; one DOES have a label ("Quantity") and one does not, but has a text node. The default label "Size" has been applied. Which is fine. Both show up btw. However, in the SERP, Extent is blank.

johnjung commented 2 years ago

Yes- for each result snippet on the SERP, the "Extent" will be blank if there is no <extent> element. This doesn't have anything to do with the change I just made- the SERPs were displaying like this before. Because of that, can we track this in a new issue? The issue with BMRC.NU.UNIV_THEATRE_PHOTO.xml is separable too, so could we track that in its own issue? (CHM.BLACK-TIMUEL seems related to BMRC.HARSH.MINOR_FRANCES.)

MomoMoses commented 2 years ago

Two issues were separated out, and given their own issue to track. Closing this one.