rohit-dua / BUB

BUB : Book Uploader Bot
http://tools.wmflabs.org/bub/
20 stars 9 forks source link

Convert ISO language codes to Library of Congress codes #24

Open nemobis opened 10 years ago

nemobis commented 10 years ago

https://archive.org/post/1021681/bookop-threw-exception-google-language-code-rm-not-found-in-lookup-table-check-for-update :

generally we prefer the library of congress code. you can see them at http://www.loc.gov/marc/languages/language_code.html

For instance, not rm (ISO 639-1) or roh (ISO 639-3), but roa. Sounds quite silly to degrade standard codes to something non-standard, but whatever. :-) If the full English names are the same for ISO/Unicode and LOC, maybe it's easier to use those.

nemobis commented 9 years ago

http://www.loc.gov/marc/ , http://www.loc.gov/standards/iso639-2/ and friends, as well as unicode.org and python.org, are totally silent on tools for mapping MARC 21 to ISO 639-3 or even ISO 639-2

http://www.loc.gov/marc/languages/introduction.pdf said:

ISO 639-2 ( Codes for the r epr esentation of names of langua g es-- P art 2: alpha-3 code ) w as based on the MARC Code List for Langua g es and published in 1998. In the 22 cases where the ISO 639-2 list has tw o alternati v e codes, the bibliographic code is the same as the MARC code. Language names in ISO 639-2 are not necessarily the same as those in MARC, particularly because of the practice of correlating the MARC language names with those used in Libr ary of Congr ess Subject Headings .

grr

nemobis commented 9 years ago

I filed http://unicode.org/cldr/trac/ticket/8106