whosonfirst-data / whosonfirst-data

Who's On First is a gazetteer of places.
http://www.whosonfirst.org/
Other
423 stars 9 forks source link

New Zealand localities - update with official names #1940

Open dwsilk opened 3 years ago

dwsilk commented 3 years ago

https://data.linz.govt.nz/table/105353-fire-and-emergency-nz-localities-nzgb-compliant-names/

This dataset contains 688 official names where the FENZ Localities dataset (previously incorporated into whosonfirst here: https://github.com/whosonfirst-data/whosonfirst-data/issues/1878) currently contains an unofficial alternative.

FYI @stepps00 @nvkelso @justinelliotmeyers

I'd have a go at a PR for this but the previous update (https://github.com/whosonfirst-data/whosonfirst-data-admin-nz/pull/12) is quite intimidating.

stepps00 commented 3 years ago

@dwsilk this is great.. do you have an example Who's On First record that currently contains an unofficial alternative name that differs from the name in the above dataset? Are updates needed to all 688 respective records in Who's On First?

We will take a look a look and try to import these names; we'll ping you on a PR once started.

justinelliotmeyers commented 3 years ago

@dwsilk Thanks for sharing this resource!!!

dwsilk commented 3 years ago

@stepps00 I think it's a smaller subset of these 688 records that need to be updated in Who's On First. There are a lot of waterway names that weren't imported (can't find them via the spelunker), and then there are some that appear to have been updated since the import.

But as examples:

Bethells Beach: https://spelunker.whosonfirst.org/id/1125860935/ refers to nz_linz:id:2755 which can be matched to the locality_id in the table linked previously where the official_placename is recorded as Te Henga (Bethells Beach).

Purakaunui: https://spelunker.whosonfirst.org/id/101914433/ refers to nz_linz:id:2480 which has the official_placename of Pūrākaunui.

The vast majority just need macrons to match the official placename.

stepps00 commented 3 years ago

Got it, thanks for the examples. I agree with the Pūrākaunui example, but I'm also remembering some of the name work that was done during the last LINZ import.

During the last import, some parenthetical names, like Te Henga (Bethells Beach) were split (Te Henga and Bethells Beach). In some (many?) cases, the LINZ names included both an English name and a Maori name, so WOF preferred the English names for these places.

Is that a fair method to process these names? Or is your expectation that the name (wof:name and name:eng_x_preferred names) would be the full Te Henga (Bethells Beach)?

Either way, I agree the WOF records for these 688 places should be reviewed.

dwsilk commented 3 years ago

I think both should be Te Henga (Bethells Beach). That is the official name.

Where alternative official names exist there are two records in the Gazetteer, e.g. North Island and Te Ika-a-Māui. This is rare.

It's much more common that a single official name exists that is a combination of two names. e.g. Aoraki/Mount Cook, Colac Bay/Ōraka, Franz Josef/Waiau, etc.

missinglink commented 3 years ago

Wait... so you're saying the Māori name is "Te Henga (Bethells Beach)" and the English name is also "Te Henga (Bethells Beach)"?

dwsilk commented 3 years ago

No, I'm saying that the name is Te Henga (Bethells Beach) and I think that wof:name and name:eng_x_preferred should both use that official name. Are there name:mri_x_preferred populated in Who's On First too?

For additional context, the Standard for New Zealand place names contains these naming criteria:

Dual names Dual names, where both names are used together as one name, recognise the equal and specific significance of both names. Generally, an original Māori name should be the first part of a dual name in recognition of the right of first discovery. The order may be reversed in special circumstances, such as where there are considerations for emergency services and maritime safety responses.

Alternative names There should be one name for one place. Alternative names are only assigned in exceptional circumstances. If alternative names are assigned either name can be used.

missinglink commented 3 years ago

I don't think the NZ naming policy maps well to the WOF schema here.

Data consumers of the name:eng_x_preferred field are expecting the name in English and they are expecting a single name, not the combination of two names in two different languages, the same applies for name:mri_x_preferred.

As a consumer of name:eng_x_preferred I would expect only ascii characters in that field.

There's kinda two options here:

  1. we change the interpretation of language tags such as eng to allow other languages in those fields too, presumably special-cased only for NZ and figure out how to message that to data consumers
  2. we only put names from the language that is specified by the tag in the field

I think allowing non-English names in the eng fields is a mistake and we should honour the contract of the field names.

Here's an example of how it's mapped in OSM:

Screenshot 2021-05-12 at 13 16 41
nvkelso commented 3 years ago

I'm open to considering using the compound name in wof:name, but agree with @missinglink that name:eng_x_preferred and name:mri_x_preferred have specific meanings now that shouldn't change as it would exist existing consumers of the data.

The default wof:name is there for downstream consumers who don't have much additional display logic, and generally we try to make that "English" or ASCII, but I concede the compound name might be more appropriate here.

The recommended way to solve this type of problem is to add app display logic for how to render (layout) "position 1" and "position 2" names. That way it can support more situations like me looking at a map of China wanting to see both the Chinese script default followed by a parenthetical in English in the same labels. Or if I'm a user with my phone set to name:mri_x_preferred I might only want to see the Maori name not the compound name.

Which admittedly is setup for "you can have your worldview and have blinders on for any other worldviews" versus embracing pluralism. But app display logic again would be recommended. WOF does indicate the official languages for a given place (always on the country and region level, and more and more so for individual records, too). So you could construct your logic from that data.

The other country that comes to mind with this situation is Switzerland where a lot of features in OSM will be labeled with 3 to 4 languages at a time. For label density it's better from a pure architecture basis to just label in a single language to maximize the number of labels that can fit, and digital displays make that hot swapping easy than print.

thisisaaronland commented 3 years ago

The rules for wof:name are only that:

By that logic the name "Te Henga (Bethells Beach)" seems fine to me as a wof:name value and it reflects the convention in New Zealand for dual names.

Importantly it doesn't preclude the inclusion of Māori and English -specific name:{LANG} variants that are (or may be) truncated.

For the sake of thoroughness you could also add "extended language subtag" variants for each name:{LANG}-{LANG} entry which contain the full name.

dwsilk commented 3 years ago

Ah I would never have guessed that this would be a uniquely New Zealand problem (especially the problem around introducing non-ASCII placenames in English).

I noticed that Taupō has eng_gb_x_preferred : Taupō (?) and eng_x_preferred : Taupo. If eng_x_preferred needs to remain ASCII, then maybe the solution is an eng_nz_x_preferred : Taupō?

There are a lot of words and placenames that New Zealand English borrows from te reo Māori, and it's going to become increasingly uncommon to see transliterations to New Zealand English that exclude macrons. Macrons are comprehensively used by New Zealand media and government. This is all relatively recent but adoption has been swift. See discussion on Wikipedia.

I think allowing non-English names in the eng fields is a mistake and we should honour the contract of the field names.

I find this a little hard to reconcile because so many names in New Zealand are Māori names, so there are already lots of non-English names used in this field ..?

thisisaaronland commented 3 years ago

The name: properties are expected to be full Unicode so anything goes, so to speak. The 7-bit ASCII rule is only for wof:name properties.

missinglink commented 3 years ago

Agh yeah actually you're right that English could contain non-ASCII chars in rare cases where loan-words enter from other languages, although I can't think of any off the top of my head, there is a tendency to ascii-fold those words when they enter the English language, like café -> cafe, I have no doubt this is not universal, but seems extremely common.

So while the ASCII rule may not be 100% universal, it's still a good indicator of a potential issue.

In this case I would argue that "Taupō" has not entered the English language, I've seen the ascii-folded "Taupo" (as per above) but I kinda feel like thats just people being lazy (or not knowing how to generate the macron on their keyboard as we often see for umlauts in German) and not an English word per se (eg. 'Schoeneberg' is not an English word IMO).

I could be wrong on that and would be happy to see an English dictionary which proves me wrong (presumably it would be an en-NZ dictionary and not an en-GB or en-US one), there is also a valid argument that the NZ places gazetteer is this dictionary, does it list "Taupō" as an English word or a "Kiwi-ish" word?

so there are already lots of non-English names used in this field

Yeah I agree with this, as above I would say that Māori place names don't belong in the English fields because they are not part of the English language.

More generally I think we can delete a bunch of repetitive names in WOF, for example Berlin has the name "Berlin" listed many times.

Due to the age and history of the city it seems like that word may have actually entered the English language (it's listed in a bunch of English dictionaries), so I can see it being listed there but it also doesn't differ in form from the German name, so it doesn't add any real value.

There are loads of other cases where the local non-English language name has been copied to other language fields such as "Mallorca", which is from the Catalan language:

Screenshot 2021-05-17 at 12 49 16

In this case we have it in Spanish and German too.

I'm not sure if either of these is technically correct (I'm guessing the Spanish is and the German isn't), but regardless the duplication is redundant here as it duplicates the local spelling of the place verbatim.

Note the English name is listed as "Majorca" (I've not seen that personally) but if it's correct then it's great to keep it since it's a different spelling.

I must admit that my position is very biased towards how computer systems ingest data and less about how it is displayed to users, and those two things are often very different.

eg:

SELECT * FROM names WHERE lang = 'eng'; // should return "Bethells Beach"
SELECT * FROM names WHERE lang = 'mri'; // should return "Te Henga"

SELECT * FROM names WHERE lang = 'eng'; // should return NULL
SELECT * FROM names WHERE lang = 'mri'; // should return "Taupō"
missinglink commented 3 years ago

The wof:lang_x_official and wof:lang_x_spoken fields are super useful to hunt out and find the local languages when one or the other field is not set.

Screenshot 2021-05-17 at 13 14 20
missinglink commented 3 years ago

As Nathanial mentioned, Switzerland has a lot of similarities, my wife is from Bern/Berne/Bärn/Berna (whatever you want to call it 😆 ) but the difference is that CH has "linguistic regions", so we can definitively say that the default name is Bern as that is the name in de-CH which is the language region to which the city belongs.

New Zealand is different in that there are two officially spoken languages and AFAIK neither have a 'primary status', so neither have priority over each other, so it's fairly unique in that context.

I personally feel that the name in the parenthesis is secondary, I doubt this is supported by NZ policy but is hinted at:

Generally, an original Māori name should be the first part of a dual name in recognition of the right of first discovery. The order may be reversed in special circumstances, such as where there are considerations for emergency services and maritime safety responses.

edit: TBC I am not proposing editing/changing the names in the NZ gazetteer in any way, just filing them correctly.