Open yolile opened 1 year ago
We can perhaps have two streams:
Input: Constrain user input, i.e. come up with an authoritative list of buyers, and ideally assign them identifiers so that if a name is changed (which occurs frequently in Canada, for example) the identifier remains the same (Canada actually has a Federal Identity Program, so that the civil service can use one consistent name internally, independent of whatever name the current governing party prefers)
Mapping: If input can't be constrained, apply normalizations so that small differences in names don't result in new identifiers, e.g. lowercase, normalize spaces, remove articles, prepositions and conjunctions (I've observed lots of variations around these), substitute common abbreviations and typos (min. -> ministry, etc.). Recommend that they produce a full list of names, so that they can identify other things to normalize. It won't be perfect, but better than no normalization.
When I worked with Quebec data many years ago, one ministry had something like 34 variations.
Agree, but note that the first one will only work for buyers but not for suppliers. For example, in Guatemala, there are some cases where they don't have the identifier for foreign companies, and only their names are recorded.
Ah, yeah, since there are different options for each case, I suppose the guidance could be organized by type of organization:
And then within each:
For suppliers, normalization is trickier, because small differences in names can actually be two different companies. In some cases, the best available option might just be to assign a local ID, and not attempt to make it a global ID.
Yeap, that is what I thought, too. I marked this issue as Documentation, but I'm not sure if this should be a Worked example under the "Deal with the hard cases" section instead
This question has been raised by at least 3 partners this year (India, Canada and Guatemala), so we need to add some guidance on the best alternatives to fill this field (and what to do with
Organization/identifier
)