sul-dlss / mods_display

MODS Display is a gem to centralize the display logic of MODS medadata.
Other
2 stars 5 forks source link

Duplicate country codes #15

Closed dazza-codes closed 2 years ago

dazza-codes commented 9 years ago

https://github.com/sul-dlss/mods_display/blob/master/lib/mods_display/country_codes.rb#L11-L12

  def country_codes
    {"aa" => "Albania",
    # snipped
    "ai" => "Anguilla",
    "ai" => "Armenia (Republic)",
    # snipped
}
dazza-codes commented 9 years ago

Maybe this can be resolved using tzinfo?

countries = TZInfo::Country.all
country_codes = countries.map {|c| [c.code, c.name] }.to_h
dazza-codes commented 9 years ago

BTW, the data contains Australian states, which are not countries, e.g.

    "xoa" => "Northern Territory",
    "xra" => "South Australia",

A state (like CA) is not a country.

dazza-codes commented 9 years ago

For completeness, the proposed solution in PR #16 is:

countries = TZInfo::Country.all
countries.map {|c| [c.code, c.name] }.to_h
=> {"AD"=>"Andorra",
 "AE"=>"United Arab Emirates",
 "AF"=>"Afghanistan",
 "AG"=>"Antigua & Barbuda",
 "AI"=>"Anguilla",
 "AL"=>"Albania",
 "AM"=>"Armenia",
 "AO"=>"Angola",
 "AQ"=>"Antarctica",
 "AR"=>"Argentina",
 "AS"=>"Samoa (American)",
 "AT"=>"Austria",
 "AU"=>"Australia",
 "AW"=>"Aruba",
 "AX"=>"Åland Islands",
 "AZ"=>"Azerbaijan",
 "BA"=>"Bosnia & Herzegovina",
 "BB"=>"Barbados",
 "BD"=>"Bangladesh",
 "BE"=>"Belgium",
 "BF"=>"Burkina Faso",
 "BG"=>"Bulgaria",
 "BH"=>"Bahrain",
 "BI"=>"Burundi",
 "BJ"=>"Benin",
 "BL"=>"St Barthelemy",
 "BM"=>"Bermuda",
 "BN"=>"Brunei",
 "BO"=>"Bolivia",
 "BQ"=>"Caribbean Netherlands",
 "BR"=>"Brazil",
 "BS"=>"Bahamas",
 "BT"=>"Bhutan",
 "BV"=>"Bouvet Island",
 "BW"=>"Botswana",
 "BY"=>"Belarus",
 "BZ"=>"Belize",
 "CA"=>"Canada",
 "CC"=>"Cocos (Keeling) Islands",
 "CD"=>"Congo (Dem. Rep.)",
 "CF"=>"Central African Rep.",
 "CG"=>"Congo (Rep.)",
 "CH"=>"Switzerland",
 "CI"=>"Côte d'Ivoire",
 "CK"=>"Cook Islands",
 "CL"=>"Chile",
 "CM"=>"Cameroon",
 "CN"=>"China",
 "CO"=>"Colombia",
 "CR"=>"Costa Rica",
 "CU"=>"Cuba",
 "CV"=>"Cape Verde",
 "CW"=>"Curacao",
 "CX"=>"Christmas Island",
 "CY"=>"Cyprus",
 "CZ"=>"Czech Republic",
 "DE"=>"Germany",
 "DJ"=>"Djibouti",
 "DK"=>"Denmark",
 "DM"=>"Dominica",
 "DO"=>"Dominican Republic",
 "DZ"=>"Algeria",
 "EC"=>"Ecuador",
 "EE"=>"Estonia",
 "EG"=>"Egypt",
 "EH"=>"Western Sahara",
 "ER"=>"Eritrea",
 "ES"=>"Spain",
 "ET"=>"Ethiopia",
 "FI"=>"Finland",
 "FJ"=>"Fiji",
 "FK"=>"Falkland Islands",
 "FM"=>"Micronesia",
 "FO"=>"Faroe Islands",
 "FR"=>"France",
 "GA"=>"Gabon",
 "GB"=>"Britain (UK)",
 "GD"=>"Grenada",
 "GE"=>"Georgia",
 "GF"=>"French Guiana",
 "GG"=>"Guernsey",
 "GH"=>"Ghana",
 "GI"=>"Gibraltar",
 "GL"=>"Greenland",
 "GM"=>"Gambia",
 "GN"=>"Guinea",
 "GP"=>"Guadeloupe",
 "GQ"=>"Equatorial Guinea",
 "GR"=>"Greece",
 "GS"=>"South Georgia & the South Sandwich Islands",
 "GT"=>"Guatemala",
 "GU"=>"Guam",
 "GW"=>"Guinea-Bissau",
 "GY"=>"Guyana",
 "HK"=>"Hong Kong",
 "HM"=>"Heard Island & McDonald Islands",
 "HN"=>"Honduras",
 "HR"=>"Croatia",
 "HT"=>"Haiti",
 "HU"=>"Hungary",
 "ID"=>"Indonesia",
 "IE"=>"Ireland",
 "IL"=>"Israel",
 "IM"=>"Isle of Man",
 "IN"=>"India",
 "IO"=>"British Indian Ocean Territory",
 "IQ"=>"Iraq",
 "IR"=>"Iran",
 "IS"=>"Iceland",
 "IT"=>"Italy",
 "JE"=>"Jersey",
 "JM"=>"Jamaica",
 "JO"=>"Jordan",
 "JP"=>"Japan",
 "KE"=>"Kenya",
 "KG"=>"Kyrgyzstan",
 "KH"=>"Cambodia",
 "KI"=>"Kiribati",
 "KM"=>"Comoros",
 "KN"=>"St Kitts & Nevis",
 "KP"=>"Korea (North)",
 "KR"=>"Korea (South)",
 "KW"=>"Kuwait",
 "KY"=>"Cayman Islands",
 "KZ"=>"Kazakhstan",
 "LA"=>"Laos",
 "LB"=>"Lebanon",
 "LC"=>"St Lucia",
 "LI"=>"Liechtenstein",
 "LK"=>"Sri Lanka",
 "LR"=>"Liberia",
 "LS"=>"Lesotho",
 "LT"=>"Lithuania",
 "LU"=>"Luxembourg",
 "LV"=>"Latvia",
 "LY"=>"Libya",
 "MA"=>"Morocco",
 "MC"=>"Monaco",
 "MD"=>"Moldova",
 "ME"=>"Montenegro",
 "MF"=>"St Martin (French part)",
 "MG"=>"Madagascar",
 "MH"=>"Marshall Islands",
 "MK"=>"Macedonia",
 "ML"=>"Mali",
 "MM"=>"Myanmar (Burma)",
 "MN"=>"Mongolia",
 "MO"=>"Macau",
 "MP"=>"Northern Mariana Islands",
 "MQ"=>"Martinique",
 "MR"=>"Mauritania",
 "MS"=>"Montserrat",
 "MT"=>"Malta",
 "MU"=>"Mauritius",
 "MV"=>"Maldives",
 "MW"=>"Malawi",
 "MX"=>"Mexico",
 "MY"=>"Malaysia",
 "MZ"=>"Mozambique",
 "NA"=>"Namibia",
 "NC"=>"New Caledonia",
 "NE"=>"Niger",
 "NF"=>"Norfolk Island",
 "NG"=>"Nigeria",
 "NI"=>"Nicaragua",
 "NL"=>"Netherlands",
 "NO"=>"Norway",
 "NP"=>"Nepal",
 "NR"=>"Nauru",
 "NU"=>"Niue",
 "NZ"=>"New Zealand",
 "OM"=>"Oman",
 "PA"=>"Panama",
 "PE"=>"Peru",
 "PF"=>"French Polynesia",
 "PG"=>"Papua New Guinea",
 "PH"=>"Philippines",
 "PK"=>"Pakistan",
 "PL"=>"Poland",
 "PM"=>"St Pierre & Miquelon",
 "PN"=>"Pitcairn",
 "PR"=>"Puerto Rico",
 "PS"=>"Palestine",
 "PT"=>"Portugal",
 "PW"=>"Palau",
 "PY"=>"Paraguay",
 "QA"=>"Qatar",
 "RE"=>"Réunion",
 "RO"=>"Romania",
 "RS"=>"Serbia",
 "RU"=>"Russia",
 "RW"=>"Rwanda",
 "SA"=>"Saudi Arabia",
 "SB"=>"Solomon Islands",
 "SC"=>"Seychelles",
 "SD"=>"Sudan",
 "SE"=>"Sweden",
 "SG"=>"Singapore",
 "SH"=>"St Helena",
 "SI"=>"Slovenia",
 "SJ"=>"Svalbard & Jan Mayen",
 "SK"=>"Slovakia",
 "SL"=>"Sierra Leone",
 "SM"=>"San Marino",
 "SN"=>"Senegal",
 "SO"=>"Somalia",
 "SR"=>"Suriname",
 "SS"=>"South Sudan",
 "ST"=>"Sao Tome & Principe",
 "SV"=>"El Salvador",
 "SX"=>"St Maarten (Dutch part)",
 "SY"=>"Syria",
 "SZ"=>"Swaziland",
 "TC"=>"Turks & Caicos Is",
 "TD"=>"Chad",
 "TF"=>"French Southern & Antarctic Lands",
 "TG"=>"Togo",
 "TH"=>"Thailand",
 "TJ"=>"Tajikistan",
 "TK"=>"Tokelau",
 "TL"=>"East Timor",
 "TM"=>"Turkmenistan",
 "TN"=>"Tunisia",
 "TO"=>"Tonga",
 "TR"=>"Turkey",
 "TT"=>"Trinidad & Tobago",
 "TV"=>"Tuvalu",
 "TW"=>"Taiwan",
 "TZ"=>"Tanzania",
 "UA"=>"Ukraine",
 "UG"=>"Uganda",
 "UM"=>"US minor outlying islands",
 "US"=>"United States",
 "UY"=>"Uruguay",
 "UZ"=>"Uzbekistan",
 "VA"=>"Vatican City",
 "VC"=>"St Vincent",
 "VE"=>"Venezuela",
 "VG"=>"Virgin Islands (UK)",
 "VI"=>"Virgin Islands (US)",
 "VN"=>"Vietnam",
 "VU"=>"Vanuatu",
 "WF"=>"Wallis & Futuna",
 "WS"=>"Samoa (western)",
 "YE"=>"Yemen",
 "YT"=>"Mayotte",
 "ZA"=>"South Africa",
 "ZM"=>"Zambia",
 "ZW"=>"Zimbabwe"}
jkeck commented 9 years ago

We're using the LOC Country Code Listing as was described in the initial ticket for supporting the encoded placeTerms.

I'm not sure we can unilaterally swap this out for TZInfo. @LynnMcRae do you have anything to weigh in on this?

The duplicate ai code for Anguilla should definitely be removed though since it has been discontinued.

LynnMcRae commented 9 years ago

Here's the proper answer ... the MODS designates the authority to use in the authority attribute. From http://www.loc.gov/standards/mods/userguide/origininfo.html under :

This attribute may be used with the following values: marccountry – This source code is used for the MARC country codes. See the MARC Code List (http://www.loc.gov/marc/countries/) for Countries for a listing of the codes and place names. iso3166 – This source code is used with country codes from ISO 3166. See the ISO 3166 Code Lists External Link (http://www.iso.org/iso/country_codes/country_codes) for a listing of the codes.

I you'll be tickled I'm sure to know that MARC itself mixes authorities, using marccountry for header 008/15-17 and iso3166 for 044 $c

I would have bet good money that the lists conformed, but they swap Anguilla/Armenia assignments MARC has Anguilla/AM and Armenia/AI; ISO-33166-1 has Anguilla/AI and Armenia/AM. Both source had changes to these codes between 1988-1992 ... hard to see it as anything but a grand snafu, given there's no "M" in Anguilla.

As to the duplicates that started this issue... code ai is marked as discontinued for anguilla so should not be in the list shown, along with any other discontinued codes as designated at http://www.loc.gov/marc/countries/countries_code.html

LynnMcRae commented 9 years ago

I'd like to update the MODS Display Rules to acknowledge the use of both authorities, assuming that's the outcome of this ticket.

atz commented 9 years ago

Sure, reissue an already used code to another country... what could go wrong?

For most of the deprecated codes, we still need to interpret data that may have already been written with them, but for the duplicate in this obscure case, we should just comment it out.

dazza-codes commented 9 years ago

Lynn's comments help to explain a few things. Given that MARC is for machines and machines adopt tzinfo [1], it would make sense for MARC/MODS to use tzinfo. OK, hopefully that's it from me and I'll leave the details to the library experts.

I've reconsidered an earlier comment about Australian states, i.e.

Before Australia became a Federation, these states may have been countries and I suppose there are MARC catalog data from those days when they were countries. So, I can understand why MARC may have to maintain codes for 'all time', although that suggests the system needs a temporal trait in the code.

[1] http://www.w3.org/TR/timezone/#tzids

LynnMcRae commented 9 years ago

Keep in mind that both MARC and MODS are basically agnostic about code sets. They are just schemas that allow capturing metadata in a structured form. Cataloging standards like AACR2 and RDA determine the actual content. Even the use of marccountry being pretty much baked into the 008/header field of MARC is a mainly a standardization of US practice on a pre-ISO-3166 vocabulary. That's why it falls within the purview of MODS Display rules to do the right de-coding based on the declared vocabulary used. I don't know about tzinfo per se except what I got from the web, but if it's a convenient and accurate tool to decode the ISO-3166 2-character codes, then it's a fine choice for that I imagine.