Open hornc opened 4 years ago
Oh! I just had to do a bunch of language code nonsense for bookreader :P https://github.com/internetarchive/bookreader/pull/150
I think it needs to be in ISO 639-2/B (see https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ). Proof: https://openlibrary.org/languages/fre ; the ISO 639-3 would be fra
). Although looking at the list of MARC languages you posted, it looks like they follow their own standard 😅
MARC languages are definitely different. They are defined here: https://www.loc.gov/marc/languages/
Regarding the relationship between the two, they say:
RELATIONSHIP TO ISO 639-2
ISO 639-2 (Codes for the representation of names of languages-- Part 2: alpha-3 code) was based on the MARC Code List for Languages and published in 1998. In the 22 cases where the ISO 639-2 list has two alternative codes, the bibliographic code is the same as the MARC code. Language names in ISO 639-2 are not necessarily the same as those in MARC, particularly because of the practice of correlating the MARC language names with those used in Library of Congress Subject Headings. The MARC list includes references for unused forms of language names, while the ISO list has in some cases included alternative name forms, but many are lacking, since this practice of supplying alternate forms has only recently been implemented. In addition the MARC documentation includes a list of individual languages under collective codes or language groups, while the ISO list only includes the group codes themselves. The Library of Congress is maintenance agency for both lists, and the two are kept compatible in terms of code additions and deletions.
The edition edit form already knows how to autocomplete languages and convert them to their associated codes. Have you looked at whatever API powers that? It seems like it should be possible to reuse it.
Note also that the codes have changed over time, so probably also need to be able to handle historical codes which were in use at the time that the catalog record was created.
I keep being reminded of this. It's probably linked to from the other URLs above too, but here is the list of codes: http://www.loc.gov/marc/languages/language_code.html
The code list on this page has both the /B and /T forms, so is helpful for crosswalking the MARC or ISO 639-2/B fre
to the ISO 639-2/T fra
form which most other code systems use.
https://www.loc.gov/standards/iso639-2/php/code_list.php
Apparently the first is derived from the name of the language in English, while the second is derived from the name of the language in the language itself (ie French vs français).
Is your feature request related to a problem? Please describe.
In https://github.com/internetarchive/openlibrary/blob/master/openlibrary/catalog/add_book/load_book.py
build_query(rec)
languages are expected to be the 3 letter codes ~ISO_639-3_language_codes~ correction: these are MARC21 language codes https://www.loc.gov/marc/languages/language_code.html which are similar, but do differ from the ISO standard.
There should be a facility to look up the the code by using language name by querying https://openlibrary.org/languages to get the code.
Specifically https://github.com/internetarchive/openlibrary/blob/f30611af14d5acc48e19cb216bbfafac37ec4ce4/openlibrary/core/vendors.py#L167-L177
gets the language as a
name
rather than a codeBy default, MARC records use the 3 character codes already: https://www.loc.gov/marc/bibliographic/bd041.html
It would be nice if the import system was flexible enough to support both methods, and be able to convert one to the other using the existing language types we store.
Describe the solution you'd like
Proposal & Constraints
Additional context