Open hornc opened 4 months ago
@hornc can you propose a priority for this based on your use cases? Is this happening at a large scale (e.g. how many records being affected)? Is this blocking one of our systems/processes? This would help us prioritize accordingly
@scottbarnes can you assign this issue to me?
I have assigned this to you, @AbhinavKRN. Please ask any questions if you get stuck anywhere.
Sure @scottbarnes on it.
So, I think this is a relatively low priority issue because I have a bot task that runs weekly to correct deprecated language codes to their current codes (if one exists).
To do this properly, we might want to think a bit about what is supposed to happen in the various cases.
What should happen in the following cases:
/languages/eth
code?/languages/esk
code?I was hoping someone would find and link the related "duplicate languages in dropdowns" issue, as that has similar requirements for extending the language code model, which I think is necessary to add this functionality.
Optional language fields we might need to add:
deprecated
: /type/boolean
deprecated_note
: /type/string
(a human readable description to indicate why this is deprecated and point to the preferred alternative, if there is one- i.e. use a more specific code (not-automatable), use a different code,
current
: /type/language
(a current language to use instead, if this code is deprecated, and there is an automatic preferred version.)
Note: some deprecated codes may not have a clear single value for current
I'm not completely happy with the current
terminology, but I can't think of a better term at the moment. Anyone have any ideas for better naming?
I think #8145 was perhaps the issue I remember, which touches on duplicate names. Is there a clearer one?
@cdrini having #8160 merged would bring us up-to-date with some of the previous language code issues that have already been raised, discussed, and addressed, so we can build on them here. Is there something blocking the merge of #8160 ?
@hornc, I had hoped we could discuss this during the Monday ABC call, but somehow it was missed during triage. I added this to the agenda for the coming week.
Howdy! Stumbled on this thanks to @RayBB ; taking a look at #8160
@hornc merged! Although I will note I'm not too sure why #8160 would help with deprecated languages 🤔 But leaving that up to you!
I just found this code that translates already translates deprecated language codes: https://github.com/internetarchive/openlibrary/blob/447142086b90648207a558a3b0ed495acb6f168d/openlibrary/catalog/marc/parse.py#L288-L317
I had been thinking this (and the related removing deprecated language codes from the edition edit dropdown) required an update to the /type/language model
. I looks like this could be fixed in code using the existing method.
It looks like MARC imports use the hardcoded deprecated language code tables in openlibrary/openlibrary/catalog/marc/parse.py , but imports from other sources do not.
@AbhinavKRN, are you still interested in working on this issue? If not I will open it back for others who may wish to work on it.
Problem
https://openlibrary.org/books/OL51818714M/Yederasiw_Mastawesha
is a recently imported item that picked up the deprecated Ethiopian language code (the metadata has since been updated), it looks like the language code lookups, converting from language name to a code are using a list of codes with deprecated duplicates, so the resulting code may be the deprecated one (it's probably arbitrary depending on which is listed first?)
How to fix: The Name -> code lookup list should only contain current item codes.
This relates to the 'duplicates in the language drop down list' issue that I thought I saw recently, but cannot find it now. The dropdown and import translation list should both only contain current language codes.
Perhaps the language code config should have a deprecated parameter, and these can be excluded as needed.
Relates to #9002 in that the example shows at least BWB sourced import are using language lookups.
The specific code to change is: https://github.com/internetarchive/openlibrary/pull/9488/files