Closed xrotwang closed 1 year ago
This is already intended: https://poseidon-framework.github.io/#/janno_details?id=spatial-position
The mismatch between these short definitions and the long explanations is an issue - maybe we should get rid of the short definitions here and reorganize the long ones instead :thinking:
Thanks for going through this, @xrotwang!
Ah, ok. I'd still recommend alpha 2 codes rather than "short name", because short names contain somewhat unexpected things like "Bahamas (the)" - so also isn't something easily produced.
Btw. let me know if this is not a good time to look through this.
Ah - didn't know that. That's a good point! So maybe we should switch to that. So far we're not validating this, but we really should at some point.
It's a perfect time to look through this. But our team is small and it will take some time to actually pick it up and implement your suggestions.
Yes, I like the idea to switch to alpha-2 encoding. Definitely I'm supportive to switch to this in our central repository, but I'm a bit reluctant to enforce that at the package format level, because it would make it harder for people to set up a quick private package. The moment they put "Germany" into the Janno file, the validator would immediately bug them... perhaps we can find a way to downgrade that to a recommendation, issuing a warning, rather than a full-blown parsing error...
Country metadata may not be crucial, so convenience may be better than accuracy here. Also there's only about 200 of these - so if it gets messy, it will be a small mess :)
Letting people refer to languages by name for decades - OTOH - lead to quite a mess ... So yes, it's a trade-off.
Yes, I hear you. Definitely something we need to make a decision on.
As this issue was raised again recently, I think we should move forward and switch to ISO-alpha2 or -alpha3 codes as defined here.
As @stschiff suggested, we should not validate this too strictly in trident
. Not least for the fact that countries change every now and then and new valid entities can arise over night. A warning should be enough.
Hmm, should we perhaps rather introduce a new field that is then validated strictly against ISO?
Ok - that's a good idea. Strictly is imho not possible though for aforementioned reason. Or did you just mean: Print a warning?
Yes, OK. Print a warning. What could be the field's name? How about Country_ISO
? and we could allow both alpha2 and alpha3?
What's the advantage of allowing both? It makes summary statistics more difficult, because you first have to unify the 2- and 3-letter codes that refer to the same country.
At least they are are easy to detect and distinguish :)
If choosing one standard over the other, I'd recommend alpha2. From my experience, alpha3 codes are more prone to being confused with (ISO 639-3) language codes.
No, you're right, there is no good reason to allow both. Let's go with alpha2 then.
Closed now with new introductions in #57 (schema release v2.7.0)
The definition for Country still seems somewhat unspecific (there are many ways to write down a "Country"). ISO 3166-1 is probably most widely used - and can easily translated to human-readable names using packages like pycountry.