ycba-cia / blacklight-collections2

5 stars 2 forks source link

Non-Latin scripts #268

Open edgartdata opened 4 years ago

edgartdata commented 4 years ago

@yulgit1 BlackLight does not render non-Latin scripts well for now. @flapka do you have records that need to display non-Latin scripts?

https://collections.britishart.yale.edu/catalog/tms:17382 shows ??? for some Marathi characters in the inscription below.

This is the actual inscription: Inscribed in Marathi in black ink, upper center: "sake 1713 vīrodhakṛtanām saṃvatsare māhe phālguna sudhi 6 gȧgārām cīṃtāman tāṃbaṭ navagīre 5 | mu | pune peṭ nārāyan"

non Latin scripts not rendered in BL

flapka commented 4 years ago

@edgartdata Great question. RB has a handful of records with data in non-Latin scripts. @KraigBinkowski Ref has more

Here's an example record in Blacklight: https://collections.britishart.yale.edu/catalog/orbis:14851429 The same record in Orbis: https://orbis.library.yale.edu/vwebv/holdingsInfo?bibId=14851429

A couple of observations:

flapka commented 4 years ago

On a positive note: when we romanize non-latin script, the end result usually includes special characters or diacritics, and these appear to render well in Blacklight, as with this example: https://collections.britishart.yale.edu/catalog/orbis:1508181

yulgit1 commented 4 years ago

Upstream of BL, the LIDO is not handling the non-Latin see below. Good to see though that the bib materials do display with appropriate special characters (ex: the Orbis:1508181 above).

<lido:inscriptionsWrap>
<lido:inscriptions lido:type="Inscription">
<lido:inscriptionTranscription>
Inscribed in Marathi in black ink, upper center: "sake 1713 virodhak?tanam sa?vatsare mahe phalguna sudhi 6 g?garam ci?taman ta?ba? navagire 5 | mu | pune pe? narayan"
</lido:inscriptionTranscription>
</lido:inscriptions>
</lido:inscriptionsWrap>
edgartdata commented 4 years ago

@yulgit1 so something for David to work on in COBOAT?

yulgit1 commented 4 years ago

Yes, or possibly the database. But I'd check coboat first.

edgartdata commented 4 years ago

Right. The inscription shows fine in TMS so COBOAT must be doing something odd with the script. will do!

edgartdata commented 4 years ago

Quick update: David says that "Coboat has always converted the data to UTF8 before building the xml documents and sending them to oaipmh."

edgartdata commented 4 years ago

@flapka to test https://libapp.library.yale.edu/OAI_BAC/src/OAIOrbisTool.jsp?verb=GetRecord&identifier=oai:orbis.library.yale.edu:14851429&metadataPrefix=marc21 to see if japanese characters display in BL correctly.

flapka commented 4 years ago

@yulgit1 I suggest keeping the initial test simple, choosing one field -- arbitrarily, perhaps the XSLT template for "publisher".

To that existing template, could we add an additional xsl:for-each that looks for a MARC 880 field in which subfield $6 contains the string "260" or "264" -- then map 880 subfields $a and $b from any 880 that meets these criteria?

yulgit1 commented 4 years ago

@flapka In oxygen using above example bibid=148551429 with 'alt_rep' field

東京都千代田区 :勉誠出版,
flapka commented 4 years ago

Thanks @yulgit1. That looks proper. For further testing, is the next step to apply a similar mapping in all of the fields in which we'd like to provide the original script, where applicable? If so, I think these are all the XSLT templates in need of modification:

NB: All the above fields are for transcribed data (inscriptions) or notes. My sense is that we may not want variant-script data in our faceted fields; happy to discuss in a future meeting.

yulgit1 commented 4 years ago

@flapka Before proceeding to each of these fields, just want to confirm that publisher looks right. Screen Shot 2020-09-28 at 11 58 55 AM

flapka commented 4 years ago

Thanks @yulgit1 . Yes, I think the parallel script in that image appears precisely as desired.

flapka commented 4 years ago

@yulgit1 Here are example records that will illustrate usage of non-Latin scripts in the fields named above:

https://collections.britishart.yale.edu/catalog/orbis:14851429

https://collections.britishart.yale.edu/catalog/orbis:13087663

https://collections.britishart.yale.edu/catalog/orbis:7863050

I find no examples with a contents note (505) in a non-Latin script; they may not exist in our catalog.

yulgit1 commented 4 years ago

@flapka - we are not currently displaying the description field. Should we? (either the description field 5xx, or its 880 link, or both)?

flapka commented 4 years ago

@yulgit1 Yes, "description" should definitely display (and its parallel script 880, where applicable). Thanks for catching this!

yulgit1 commented 4 years ago

@flapka - my mistake, there's some aliasing going on. "Description" is going to "Notes".

flapka commented 4 years ago

@yulgit1 Oh good, and that should have been obvious to me too.

yulgit1 commented 4 years ago

@flapka more questions, which subfields should be displayed for title, title_alt,edition, and description? As of now I'm just using the first 'a'.

title (MARC 245) title_alt (246) edition (250) publisher (260ab / 264ab) publishDate (260c) description (5xx) contents (505

Also for marc publisher, what is displayed on the item page as publisher is actually a concatenation of publisher (260ab) and publishDate(260c). The publishDate is used separately in the results page listings. For the 880 link I have 880abc for the altrep_publisher (to get displayed with the publisher(the concatenation). I don't think there's a need for a singular 880 publish date, but let me know if I'm wrong and what that would be used for.

flapka commented 4 years ago

@yulgit1 I think you're right on the question of publish date

The parallel-scripts (from 880) ought to map from the same subfields of their parallel fields, i.e.:

The other goal is to add @code!='6' universally, or if we need to apply it template-by-template, to the following: author, title, publisher, edition, description, contents, author_additional, title_alt, topic, topic_subjectActor, genre, object_name, geographic,

yulgit1 commented 4 years ago

resolved, indexing to run tonight:

https://github.com/ycba-cia/blacklight-collections2/commit/bf447b0434ae6385bd11e34bd15d6c5b4066e9b4 https://git.yale.edu/ermadmix/ycba_xslts/commit/2a005a019da983e9d83d4a75f37fe1e7f1ce9081 https://git.yale.edu/ermadmix/ycba_xslts/commit/2401f2bd145a013dc55f91c1c84c0ac3e984c95a https://git.yale.edu/ermadmix/ycba_xslts/commit/74f4d7af796278aa4121ae73199faed3254d9197 https://git.yale.edu/ermadmix/ycba_xslts/commit/f3a3c3bfec2c990f88d410883faf4464b67e7357 https://git.yale.edu/ermadmix/ycba_xslts/commit/eff33daae73f0b10d909172e799ead13fa2fa732 https://git.yale.edu/ermadmix/ycba_xslts/commit/c71f6f36e605a8e7ab5c260c2c76876c2f5df115 https://git.yale.edu/ermadmix/ycba_xslts/commit/909bf5ecaa51daec16db31f576ab87d5f64ce489 https://git.yale.edu/ermadmix/ycba_xslts/commit/2ccc8296ab2223b1e65abaadd8fa88941f1c15c7 https://git.yale.edu/ermadmix/ycba_xslts/commit/6698c1f7ebc775a822237f34d8f8d9fcf3527334 https://git.yale.edu/ermadmix/ycba_xslts/commit/9b8f534ed3743cdea7f3f1ef5801b5f6e462eedc

yulgit1 commented 4 years ago

Indexing ran last night, but appears to have not used the new xslt changes. In troubleshooting one just now, it used the correct new xslt so I'm not sure what happened last night. Will see tomorrow.

flapka commented 4 years ago

@yulgit1 Not sure if this is related, but: It looks like Blacklight is incorrectly rendering the dash between dates in artists' names, as in this example: https://collections.britishart.yale.edu/catalog/tms:15227

James Bruce, 1730–1794

yulgit1 commented 4 years ago

It is related. Ongoing work trying to enable UTF8 characters LIDO. David has reverted the changes causing the unintended characters that should be picked up tomorrow. From an email from Rob:

image

yulgit1 commented 4 years ago

The hiccup with the encoding has been reverted: https://collections.britishart.yale.edu/catalog/tms:15227

And the MARC non-latin is now indexed and displayed: https://collections.britishart.yale.edu/catalog/orbis:14851429 https://collections.britishart.yale.edu/catalog/orbis:13087663 https://collections.britishart.yale.edu/catalog/orbis:7863050

Still outstanding is to render special characters from LIDO. David is working on it: https://collections.britishart.yale.edu/catalog/tms:17382

flapka commented 4 years ago

@yulgit1 The 3 MARC non-latin examples look great.

I see only one small issue: The creator field in https://collections.britishart.yale.edu/catalog/orbis:7863050 begins with 880-01 -- suggesting that @code!='6' might be missing from the corresponding XSLT template?

Many thanks Eric!

yulgit1 commented 4 years ago

Sorry about that. There is indirection to auth_author_display_ss field. It is resolved:

https://git.yale.edu/ermadmix/ycba_xslts/commit/0c1497fe4950e5f176d4029936223e54ba921d6a

Will show up in next indexing.

edgartdata commented 4 years ago

Issue is resolved for RB and Reference LIbrary. Rob L is checking on this for the art collection.

edgartdata commented 3 years ago

CogApp is looking into this and will be in contact in January, hopefully.

edgartdata commented 3 years ago

@robl Not an urgent issue but I am circling back to it to see if you have an update. I can also reach out to Tristan Roddis myself if that's helpful (he's my MCN IIIF SIG co-chair now :) )

robl commented 3 years ago

@edgartdata , thanks for checking. To clarify here just for the record, this would not have been Cogapp as a firm working on this (since COBOAT is an unsupported, long-ago legacy product). I had reached out individually to Ben there, and he had kindly been hoping he might be able to volunteer some personal time to this over his winter holidays. I have not heard back, which leads me to believe that he may not have had time to dig into this; I can check discreetly to see.