pelias / labels

Pelias Label generation
https://pelias.io
MIT License
4 stars 9 forks source link

Refactor label generation with a "label builder" function #49

Closed orangejulius closed 2 years ago

orangejulius commented 2 years ago

Our label generation logic has gotten a bit convoluted. While there's a good amount that can be handled only through configuration in a JSON structure, the actual code tries to handle all possible label formats in one place. Even with only a few general structures ("American"/"European" style, and "Korean" style), the code is already quite complex and confusing.

This PR is a pure refactoring that simplifies the existing logic, and introduces the idea of a "label builder" function that can be specified for any country, allowing more configuration for places that have unique address formats.

How the current code works

Roughly, a label is built using 3 parts: prefix components, admin components, and postfix components.

Depending on the country, only one of the prefix components or postfix components are used.

For example, in many countries the housenumber/streetname come before admin components like region or country, so they would be handled by the prefix components.

However, some countries have address formats that put admin info first (like Korea). These countries would not set any prefix components, and would instead add postfix components.

This is a reasonable system, but the logic was already getting quite complex: both the prefix and postfix component code have to have conditionals that depend on the country, making the actual code quite a bit less clear.

Worse, the system is still not all that flexible: the code for prefix and postfix components has to handle any possible permutation of those elements that any country might use.

Fortunately, we haven't added all that many label formats for different countries yet, but its clear doing so would not be sustainable with the current layout.

How the code in this PR works

This PR allows for each country's label schema to define a "builder function" that constructs a label for that country however is best. There's a default label builder that will be used if one isn't specified, and it works for the vast majority of American/European style addresses.

With this pattern, any single label builder can probably be quite simple. Both the default and the newly added Korean label builder are shorter, with fewer conditionals, than before.

Future goals

This PR is meant to be a foundation from which we can define more address formats. In particular, I'd like to add address formats for Japan. Japan is unique in that it has address formats that vary for English and Japanese. It also tends to have lots of specific requirements for different regions, and even cities. Attempting to build all those different cases into generic code for the whole planet would be a bad idea.

For now, the only unique address builder is for a specific country, Korea. But I could easily see how we eventually find patterns for address formats that share commonalities across countries. For example, lots of countries in Eastern Europe or South America have very similar formats.

missinglink commented 2 years ago

Yeah this is great, there is also the other open PR which also aims to improve the logic in a different way.

The concept of per-country schemas is great, and since we now add the country_code property to results we can probably make that simpler.

I would go one step further and make the selection of builders automatic based on country_code with a default fallback.

The other PR from Joxit is excellent but made it clear that each installation of Pelias may want to modify these labels to suit their own business specifically.

The simplest solution there seems to be to have this module export a simple interface, maybe a single function? then other providers can simply alias their own npm module as a replacement