pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Language Field in figgy not pulling/viewing correct languages from MARC record #6170

Open kelea99 opened 9 months ago

kelea99 commented 9 months ago

Summary

Reported by Minjie Chen. Figgy/DPUL is not reflecting the correct languages for an item record in the "Language" field in figgy/DPUL. Slack with @tpendragon believes Language field being pulled from whatever the MARC code is for 008[35-37]:041a:041d. Confirmation from Minjie that pulling language data from 008/35-37, 041a, and 041d sounds straightforward, but that is not what Figgy is displaying, which is only English.
Records originally reported by Minjie: Figgy and DPUL identify them as in English only.

Test we did: We chose an item that originally was correctly viewing the languages. in this case Swedish and Delaware/ Munsee. (showing Swedish/Delaware in Figgy before our test).

[from #digital_library] I thought I might as well try something uploaded in figgy that is : recent [post-alma migration and ingested 2022] Multi-lingual in Munsee/Delaware I found an item in figgy that reflects the same information as orangelight:https://figgy.princeton.edu/catalog/bb13ea71-acaf-4dc4-904d-1e01b8c1cda1 Looking at the staff view of the languages: https://catalog.princeton.edu/catalog/9925251263506421/staff_view I see 2 041 fields: 041 1 a| del a| swe k| swe h| ger 041 7 a| swe a| umu 2| iso639-3 008 field: 008 860212s1696 sw bi 000 0 del^^

We refreshed the metadata in figgy to see what it would do, presuming it would only show one language. Something interesting happened. The languages did change, but to English only! Which is nowhere in the record. Confirmation from Minjie - "eng" is definitely not appearing anywhere in the language codes of the Swedish-Munsee bilingual book.

Questions:

Impact

Please include hard deadlines, if the exhibit is part of an event, the issue is stopping work, etc. I want to test this in staging, but it would be good to know if this is affecting the entirety of the MARC ingests, if things are refreshed or new things ingested. Ill ask @tpendragon today to work in staging.

Priority recommendation

Sudden Priority Justification

Required if "asap" or "within the next 3 weeks" is checked. Add "Sudden Priority" and "Maintenance/Research" labels If this is an issue affecting any/all newly ingested MARC in figgy/any all refreshed metadata in figgy, it affects the faceting for all items in DPUL. It will make for a faulty search, perhaps wrongly showing items in a western language.

*For Anu, the individual asking about the languages, she would need the languages fixed by April, 2023. I still think we should look into it sooner, within the next 3 weeks, if we could?

Expected behaviour

Actual behaviour

Steps to reproduce behaviour

Screenshots

escowles commented 9 months ago

I know we made a number of changes in language display functionality in Orangelight/Bibdata, which is probably part of this. See https://github.com/pulibrary/bibdata/issues/2069 which is referenced by a few PRs.

kelea99 commented 9 months ago

ok, so from what I see, Minjie's thought Re Figgy not being able to interpret the language code when it is taken from iso639 is possible/probable?

escowles commented 9 months ago

Yes, that is my guess — maybe if there are only ISO 639-3 languages, then there's a problem parsing them?

minjiec commented 9 months ago

Yes, I too suspect that ISO 639-3 language codes are trigging Figgy up.

Here is the context: ISO 639-3 language codes were applied only recently to a small number of our bibs like these. ISO 639-3 is much larger than "MARC Code List for Languages" so is able to match up with world languages, including indigenous languages and dialects, in greater granularity.

kelea99 commented 9 months ago

Confirmation from Anu regarding priority: It’s not time-sensitive – by April [2023] or so, would be ideal.