opensupplyhub / open-apparel-registry

An application for searching, matching, uploading factories.
MIT License
32 stars 13 forks source link

OS Hub: Parsing Issue for countries that should be recognized #2053

Closed mariel-oar closed 2 years ago

mariel-oar commented 2 years ago

Overview

List items uploaded in the lists below resulted in a parsing error and the error code given referenced the country code being the issue; however, all of the countries mentioned as an issue seem to be valid country values.

Expected Behavior

United Kingdom, Dominican Republic, Russian Federation and Hong Kong SAR should all have a country code match.

Actual Behavior

The lists below each returned a parsing error related to the country value. In the the case of Hong Kong SAR and United Kingdom, there were other instances of these country values uploaded in the same list that did not cause errors; in the case of the DR, that is the only DR facility on the list.

Lists with parsing error due to country: United Kingdom - https://076e8b512ec74bb6832725db.openapparel.org/lists/1727 Hong Kong SAR - https://076e8b512ec74bb6832725db.openapparel.org/lists/1725 Dominican Republic - https://076e8b512ec74bb6832725db.openapparel.org/lists/1723 Russian Federation - https://076e8b512ec74bb6832725db.openapparel.org/lists/1724

All of these terms return a match in the ISO online country lookup: https://www.iso.org/obp/ui/#search

mariel-oar commented 2 years ago

@obrienad @jwalgran, we are worried that this could turn into a bigger issue at launch. Can you spend a point investigating this one next sprint?

Klaus has a more flexible country list (ISO2 and ISO3, plus allowing for some grammar/punctuation variation)...might not be what is causing this issue, but we think it would be good to allow for more variation in uploaded country values. @KlausGPaul , could you attach your list here?

cc: @vrwOAR

KlausGPaul commented 2 years ago

While this may not fully address the issue described, the attached country.py would be a code change for a much broader and more forgiving country name mapping. It's sources include

It also contains ISO 3166-3 three character country codes.

  "cze": "CZ",
  "czech": "CZ",
  "czech republic": "CZ",
  "czechia": "CZ",
  "czechia (czech republic)": "CZ",
  "czechia, czech republic": "CZ",

countries.amended.py.md

TaiWilkin commented 2 years ago

Except for "Hong Kong SAR", each of these items had a newline character in the middle of the country. Example

"Could not find a country code for \"United\nKingdom\"."

Hong Kong SAR is the one case where it's not in our list.

Recommended actions:

mariel-oar commented 2 years ago

thanks for the info @TaiWilkin, cc: @KlausGPaul

mstone121 commented 2 years ago

@KlausGPaul FYI: I added 'hong kong sar' to countries.py along with your amended list in https://github.com/open-apparel-registry/open-apparel-registry/pull/2195.