Open edgartdata opened 4 years ago
@edgartdata Great question. RB has a handful of records with data in non-Latin scripts. @KraigBinkowski Ref has more
Here's an example record in Blacklight: https://collections.britishart.yale.edu/catalog/orbis:14851429 The same record in Orbis: https://orbis.library.yale.edu/vwebv/holdingsInfo?bibId=14851429
A couple of observations:
On a positive note: when we romanize non-latin script, the end result usually includes special characters or diacritics, and these appear to render well in Blacklight, as with this example: https://collections.britishart.yale.edu/catalog/orbis:1508181
Upstream of BL, the LIDO is not handling the non-Latin see below. Good to see though that the bib materials do display with appropriate special characters (ex: the Orbis:1508181 above).
<lido:inscriptionsWrap>
<lido:inscriptions lido:type="Inscription">
<lido:inscriptionTranscription>
Inscribed in Marathi in black ink, upper center: "sake 1713 virodhak?tanam sa?vatsare mahe phalguna sudhi 6 g?garam ci?taman ta?ba? navagire 5 | mu | pune pe? narayan"
</lido:inscriptionTranscription>
</lido:inscriptions>
</lido:inscriptionsWrap>
@yulgit1 so something for David to work on in COBOAT?
Yes, or possibly the database. But I'd check coboat first.
Right. The inscription shows fine in TMS so COBOAT must be doing something odd with the script. will do!
Quick update: David says that "Coboat has always converted the data to UTF8 before building the xml documents and sending them to oaipmh."
@flapka to test https://libapp.library.yale.edu/OAI_BAC/src/OAIOrbisTool.jsp?verb=GetRecord&identifier=oai:orbis.library.yale.edu:14851429&metadataPrefix=marc21 to see if japanese characters display in BL correctly.
@yulgit1 I suggest keeping the initial test simple, choosing one field -- arbitrarily, perhaps the XSLT template for "publisher".
To that existing template, could we add an additional xsl:for-each that looks for a MARC 880 field in which subfield $6 contains the string "260" or "264" -- then map 880 subfields $a and $b from any 880 that meets these criteria?
@flapka In oxygen using above example bibid=148551429 with 'alt_rep' field
Thanks @yulgit1. That looks proper. For further testing, is the next step to apply a similar mapping in all of the fields in which we'd like to provide the original script, where applicable? If so, I think these are all the XSLT templates in need of modification:
NB: All the above fields are for transcribed data (inscriptions) or notes. My sense is that we may not want variant-script data in our faceted fields; happy to discuss in a future meeting.
@flapka Before proceeding to each of these fields, just want to confirm that publisher looks right.
Thanks @yulgit1 . Yes, I think the parallel script in that image appears precisely as desired.
@yulgit1 Here are example records that will illustrate usage of non-Latin scripts in the fields named above:
https://collections.britishart.yale.edu/catalog/orbis:14851429
https://collections.britishart.yale.edu/catalog/orbis:13087663
https://collections.britishart.yale.edu/catalog/orbis:7863050
I find no examples with a contents note (505) in a non-Latin script; they may not exist in our catalog.
@flapka - we are not currently displaying the description field. Should we? (either the description field 5xx, or its 880 link, or both)?
@yulgit1 Yes, "description" should definitely display (and its parallel script 880, where applicable). Thanks for catching this!
@flapka - my mistake, there's some aliasing going on. "Description" is going to "Notes".
@yulgit1 Oh good, and that should have been obvious to me too.
@flapka more questions, which subfields should be displayed for title, title_alt,edition, and description? As of now I'm just using the first 'a'.
title (MARC 245) title_alt (246) edition (250) publisher (260ab / 264ab) publishDate (260c) description (5xx) contents (505
Also for marc publisher, what is displayed on the item page as publisher is actually a concatenation of publisher (260ab) and publishDate(260c). The publishDate is used separately in the results page listings. For the 880 link I have 880abc for the altrep_publisher (to get displayed with the publisher(the concatenation). I don't think there's a need for a singular 880 publish date, but let me know if I'm wrong and what that would be used for.
@yulgit1 I think you're right on the question of publish date
The parallel-scripts (from 880) ought to map from the same subfields of their parallel fields, i.e.:
The other goal is to add @code!='6'
universally, or if we need to apply it template-by-template, to the following: author, title, publisher, edition, description, contents, author_additional, title_alt, topic, topic_subjectActor, genre, object_name, geographic,
resolved, indexing to run tonight:
https://github.com/ycba-cia/blacklight-collections2/commit/bf447b0434ae6385bd11e34bd15d6c5b4066e9b4 https://git.yale.edu/ermadmix/ycba_xslts/commit/2a005a019da983e9d83d4a75f37fe1e7f1ce9081 https://git.yale.edu/ermadmix/ycba_xslts/commit/2401f2bd145a013dc55f91c1c84c0ac3e984c95a https://git.yale.edu/ermadmix/ycba_xslts/commit/74f4d7af796278aa4121ae73199faed3254d9197 https://git.yale.edu/ermadmix/ycba_xslts/commit/f3a3c3bfec2c990f88d410883faf4464b67e7357 https://git.yale.edu/ermadmix/ycba_xslts/commit/eff33daae73f0b10d909172e799ead13fa2fa732 https://git.yale.edu/ermadmix/ycba_xslts/commit/c71f6f36e605a8e7ab5c260c2c76876c2f5df115 https://git.yale.edu/ermadmix/ycba_xslts/commit/909bf5ecaa51daec16db31f576ab87d5f64ce489 https://git.yale.edu/ermadmix/ycba_xslts/commit/2ccc8296ab2223b1e65abaadd8fa88941f1c15c7 https://git.yale.edu/ermadmix/ycba_xslts/commit/6698c1f7ebc775a822237f34d8f8d9fcf3527334 https://git.yale.edu/ermadmix/ycba_xslts/commit/9b8f534ed3743cdea7f3f1ef5801b5f6e462eedc
Indexing ran last night, but appears to have not used the new xslt changes. In troubleshooting one just now, it used the correct new xslt so I'm not sure what happened last night. Will see tomorrow.
@yulgit1 Not sure if this is related, but: It looks like Blacklight is incorrectly rendering the dash between dates in artists' names, as in this example: https://collections.britishart.yale.edu/catalog/tms:15227
James Bruce, 1730–1794
It is related. Ongoing work trying to enable UTF8 characters LIDO. David has reverted the changes causing the unintended characters that should be picked up tomorrow. From an email from Rob:
The hiccup with the encoding has been reverted: https://collections.britishart.yale.edu/catalog/tms:15227
And the MARC non-latin is now indexed and displayed: https://collections.britishart.yale.edu/catalog/orbis:14851429 https://collections.britishart.yale.edu/catalog/orbis:13087663 https://collections.britishart.yale.edu/catalog/orbis:7863050
Still outstanding is to render special characters from LIDO. David is working on it: https://collections.britishart.yale.edu/catalog/tms:17382
@yulgit1 The 3 MARC non-latin examples look great.
I see only one small issue: The creator field in https://collections.britishart.yale.edu/catalog/orbis:7863050 begins with 880-01 -- suggesting that @code!='6' might be missing from the corresponding XSLT template?
Many thanks Eric!
Sorry about that. There is indirection to auth_author_display_ss field. It is resolved:
https://git.yale.edu/ermadmix/ycba_xslts/commit/0c1497fe4950e5f176d4029936223e54ba921d6a
Will show up in next indexing.
Issue is resolved for RB and Reference LIbrary. Rob L is checking on this for the art collection.
CogApp is looking into this and will be in contact in January, hopefully.
@robl Not an urgent issue but I am circling back to it to see if you have an update. I can also reach out to Tristan Roddis myself if that's helpful (he's my MCN IIIF SIG co-chair now :) )
@edgartdata , thanks for checking. To clarify here just for the record, this would not have been Cogapp as a firm working on this (since COBOAT is an unsupported, long-ago legacy product). I had reached out individually to Ben there, and he had kindly been hoping he might be able to volunteer some personal time to this over his winter holidays. I have not heard back, which leads me to believe that he may not have had time to dig into this; I can check discreetly to see.
@yulgit1 BlackLight does not render non-Latin scripts well for now. @flapka do you have records that need to display non-Latin scripts?
https://collections.britishart.yale.edu/catalog/tms:17382 shows ??? for some Marathi characters in the inscription below.
This is the actual inscription: Inscribed in Marathi in black ink, upper center: "sake 1713 vīrodhakṛtanām saṃvatsare māhe phālguna sudhi 6 gȧgārām cīṃtāman tāṃbaṭ navagīre 5 | mu | pune peṭ nārāyan"