osmlab / name-suggestion-index

Canonical common brand names, operators, transit and flags for OpenStreetMap.
https://nsi.guide
BSD 3-Clause "New" or "Revised" License
711 stars 858 forks source link

Issues with auto-generated operators for power generators #5540

Closed sb12 closed 4 months ago

sb12 commented 2 years ago

I noticed a few issues with the operator's list for power generators, which seems to be automatically generated by looking at statistics(?):

Given these issues and the power this index has, especially by the usage in iD to encourage users to auto-update the operators, I think it needs a better higher-quality process for adding and maintaing the suggestions. Maybe quality is more important than quantity in this case.

arch0345 commented 2 years ago

Thanks @sb12 for identifying these issues, I'll review your PRs soon!

Some of the operators are not a common operator for power plants, they just happen to have some solar panels on their roofs, e.g. stuff like Stadt Karlsruhe, Aldi Süd, IKEA, dm-drogerie markt GmbH & Co. KG, Gemeinde Ubstadt-Weiher, Init SE and many more.

These seem to be in power/generator and not power/plant, so these should be fine? The tagging scheme for solar panels includes power/generator, so this could be beneficial for brands that operate solar roofs at some of their locations.

Some operators are only operating one single plant, usually a windpark, e.g. Borkum Riffgrund I Offshore Windpark A/S GmbH & Co. OHG or 'EnBW Hohe See GmbH & Co. KG', so there's not really a point in suggesting it anywhere else

In addition to manually adding entries, we also collect operators with 10 or more instances by filtering weekly planet files from planet.osm.org. I recommend adding this to the matchNames of the appropriate entry (e.g. EnBW Hohe See GmbH & Co. KG might be added to the matchNames for Netze BW) or could be added to the exclude parameter at the top of the file under the named category (e.g. "^Borkum Riffgrund I Offshore Windpark A/S GmbH & Co. OHG$"). This is necessary in order for this not to be added again from the filtered planet file and to update the tags of existing features on OpenStreetMap using these wrong tags.

Some operators are actually only the brand or parent company, e.g. Deutsche Bahn AG, where DB Energie GmbH is the power supplier for the Deutsche Bahn.

Thanks for catching this! I will try to fix these issues on my own but if you could create PRs for instances you catch by adding matchNames as I explained earlier, that would really help.

Some operators are actually only the manufacturer for wind turbines e.g. Enercon, Nordex etc.

I'll research this when I have more free time, but if you could point me towards any links that could help verify this, that would be very helpful.

Some curiosities like Aldi Süd in de being linked to the hungarian Wikipedia article

It looks like it's linked to the English Wikipedia article? Should this be changed to the German one?

Given these issues and the power this index has, especially by the usage in iD to encourage users to auto-update the operators, I think it needs a better higher-quality process for adding and maintaing the suggestions. Maybe quality is more important than quantity in this case.

The collected brands/operators by themselves aren't generated by iD presets until a :wikidata tag is added to their entry in NSI. Maybe when linking wikidata pages they can be reviewed more thoroughly to see if, as you mentioned, they're actually operators and not manufacturers

sb12 commented 2 years ago

Thanks for your reply.

Some of the operators are not a common operator for power plants, they just happen to have some solar panels on their roofs, e.g. stuff like Stadt Karlsruhe, Aldi Süd, IKEA, dm-drogerie markt GmbH & Co. KG, Gemeinde Ubstadt-Weiher, Init SE and many more.

These seem to be in power/generator and not power/plant, so these should be fine? The tagging scheme for solar panels includes power/generator, so this could be beneficial for brands that operate solar roofs at some of their locations.

The thing is some of these are very local operators e.g. municipalities or smaller companies, so it is a bit strange to have them listed as power companies. On the other hand Aldi Süd or IKEA are rather brands than operators, but probably the brand was used because mappers did not know the correct operator.

Some operators are only operating one single plant, usually a windpark, e.g. Borkum Riffgrund I Offshore Windpark A/S GmbH & Co. OHG or 'EnBW Hohe See GmbH & Co. KG', so there's not really a point in suggesting it anywhere else

In addition to manually adding entries, we also collect operators with 10 or more instances by filtering weekly planet files from planet.osm.org. I recommend adding this to the matchNames of the appropriate entry (e.g. EnBW Hohe See GmbH & Co. KG might be added to the matchNames for Netze BW) or could be added to the exclude parameter at the top of the file under the named category (e.g. "^Borkum Riffgrund I Offshore Windpark A/S GmbH & Co. OHG$"). This is necessary in order for this not to be added again from the filtered planet file and to update the tags of existing features on OpenStreetMap using these wrong tags.

Well, these operators are still correct, so adding it to matchNames for some other related company would make it worse. Maybe you could increase the treshold to e.g. 100 for solar and wind generators, that would probably remove most of these project related operators.

Some operators are actually only the manufacturer for wind turbines e.g. Enercon, Nordex etc.

I'll research this when I have more free time, but if you could point me towards any links that could help verify this, that would be very helpful.

I checked the wikipedia page and I have never heard that any of the them are actually operating windparks themselves. They may just provide a maintenance contract and have their logo on the turbines.

Some curiosities like Aldi Süd in de being linked to the hungarian Wikipedia article

It looks like it's linked to the English Wikipedia article? Should this be changed to the German one?

In the generator.json it looks like this:

    {
      "displayName": "Aldi Süd",
      "id": "aldisud-b8ae25",
      "locationSet": {"include": ["de"]},
      "tags": {
        "operator": "Aldi Süd",
        "operator:wikidata": "Q41171672",
        "operator:wikipedia": "hu:Aldi Süd",
        "power": "generator"
      }
    },

Given these issues and the power this index has, especially by the usage in iD to encourage users to auto-update the operators, I think it needs a better higher-quality process for adding and maintaing the suggestions. Maybe quality is more important than quantity in this case.

The collected brands/operators by themselves aren't generated by iD presets until a :wikidata tag is added to their entry in NSI. Maybe when linking wikidata pages they can be reviewed more thoroughly to see if, as you mentioned, they're actually operators and not manufacturers

Thanks for clarifying that. I wasn't really aware of that as I do not use iD a lot myself, but got a bit concerned because I saw some false edits by new mappers that relied on iD's suggestions.

sb12 commented 2 years ago

Some operators are actually only the brand or parent company, e.g. Deutsche Bahn AG, where DB Energie GmbH is the power supplier for the Deutsche Bahn.

Thanks for catching this! I will try to fix these issues on my own but if you could create PRs for instances you catch by adding matchNames as I explained earlier, that would really help.

This seems to come from the 10 solar panels on the Berlin Hbf building. It's unclear whether they are operated by an external company, by the station operator "DB Station&Service AG", "DB Energie GmbH" or some other DB company. Maybe increasing the threshold and removing the entry helps here as well.

sebhan2 commented 2 years ago

Hello, I just recently learned bout the NSI and I am a big fan already. However I just found the same bug concerning the Hungarian Wikipedia links that are suggested for charging stations of ALDI Süd in germany. This clearly seems to be an error. Can you please change the json file to suggest "operator:wikipedia=de:Aldi Süd" instead? Thank you very much. Best regards

arch0345 commented 2 years ago

Hello, I just recently learned bout the NSI and I am a big fan already. However I just found the same bug concerning the Hungarian Wikipedia links that are suggested for charging stations of ALDI Süd in germany. This clearly seems to be an error. Can you please change the json file to suggest "operator:wikipedia=de:Aldi Süd" instead? Thank you very much. Best regards

We run a script that matches the wikipedia article to the one linked on the wikidata page, and unfortunately I'm not able to link the Aldi Süd article on the German Wikipedia since it's a redirection page to ALDI, which is already linked to this Wikidata page. @bhousel has mentioned before that we might remove :wikipedia tags in the index entirely due to situations like this.

sebhan2 commented 2 years ago

Thanks for the fast response. I see the problem. I generally think the wikipedia link is a good idea.

Changing the wikidata page may be another option. I think having one ALDI page Q125054 for both ALDI corporations and one additional entry for each ALDI Süd Q41171672 and ALDI Nord Q41171373 is also not a perfect solution. However I am not sure what thought process went into the wikidata articles and who to talk to in that regards...

bhousel commented 2 years ago

I think having one ALDI page Q125054 for both ALDI corporations and one additional entry for each ALDI Süd Q41171672 and ALDI Nord Q41171373 is also not a perfect solution. However I am not sure what thought process went into the wikidata articles and who to talk to in that regards...

I think the most recent time we changed ALDI was #5109, where the Nord/Süd split exists in Germany only, and in the other countries where they operate, it will just be named "ALDI" but point to the correct Wikidata that operates there.

sebhan2 commented 2 years ago

Thanks for linking the previous discussion. Now I understand the issue. From a german perspective splitting up the ALDI conglomerate to have just two entries on Wikidata makes a lot of sense. However by looking at the global situation this appears not to be the best solution anymore if certain countries just know the supermarket chain under the short name “ALDI”.

It is neither possible to link the german wikipedia page “de:Aldi” to ALDI Süd Q41171672 nor to ALDI Nord Q41171373 on their respective wikidata pages since the link “de:Aldi” is already linked to the wikidata page ALDI Q125054.

In the case of ALDI Süd Q41171672 and ALDI Nord Q41171373 removing the “:wikipedia” tags would make sense. I am not sure whether one should remove the “:wikipedia” tag on other brands as well. One advantage of a general removal would be consistency. However I personally like the “:wikipedia” tags. I am not sure.

peternewman commented 2 years ago

This seems to come from the 10 solar panels on the Berlin Hbf building. It's unclear whether they are operated by an external company, by the station operator "DB Station&Service AG", "DB Energie GmbH" or some other DB company. Maybe increasing the threshold and removing the entry helps here as well.

I don't know if the threshold is global or per category, but I can see 10 being a good number for other things like shops.

Either way, although it would probably need more processing, couldn't the threshold be changed to include an area aspect? If you discount the second solar panels (or shop) which are within say 100m of each other, then we should get a saner value for the range of something which would automagically exclude our single plant with lots of turbines/panels, but would allow somewhere with geographically dispersed things, but only a few of them.

bhousel commented 2 years ago

Either way, although it would probably need more processing, couldn't the threshold be changed to include an area aspect?

We can change the threshold for collection to something other than 10 or 50, but we can't easily change it to count things differently for certain geographic areas..

The collector script works with the full planet, but we prefilter it to contain only tags we are interested in counting, and use -R to discard all the child nodes, so we don't currently know where in the world the "names" are. (These steps cut the size of the planet down significantly so it doesn't crash when node tries to use it, and things should complete in reasonable time).

https://github.com/ideditor/nsi-collector/blob/e5e1752266a83b4a38e5de1527da63a9dc68b180/README.md?plain=1#L35-L36

https://github.com/ideditor/nsi-collector/blob/e5e1752266a83b4a38e5de1527da63a9dc68b180/collect_osm.js#L29-L37

peternewman commented 2 years ago

Either way, although it would probably need more processing, couldn't the threshold be changed to include an area aspect?

We can change the threshold for collection to something other than 10 or 50, but we can't easily change it to count things differently for certain geographic areas..

Ah that's unfortunate, just hashing a rounding of the coordinates ought to be enough to do the trick I'd imagine.

The collector script works with the full planet, but we prefilter it to contain only tags we are interested in counting, and use -R to discard all the child nodes, so we don't currently know where in the world the "names" are.

Presumably you mean --omit-referenced/-R means you don't have any coordinates for a way because it no longer has any nodes?