vmg / rinku

Autolinking. Ruby. Yes, that's pretty much it.
ISC License
597 stars 67 forks source link

Support possible German umlauts in email address strings #88

Open rstammer opened 4 years ago

rstammer commented 4 years ago

In German language, and of course in many others, there are special characters, i.e. the so-called umlauts in case of German.

Under some circumstances these characters are supported by modern SMTP servers, see i.e. this discussion. (this is also an interesting related read)

We are also experiencing this within our data, but unfortunately rinkus auto_link mechanics fail to handle email addresses for mailto-links correctly, if the email address contains an umlaut.

Current behavior

Rinku.auto_link("björn-jürgen.nußbaum@example.com") 
# björn-jürgen.nuß<a href="mailto:naum@example.com">naum@example.com</a>

Expected behavior

Rinku.auto_link("björn-jürgen.nußbaum@example.com") 
# <a href="mailto:björn-jürgen.nußbaum@example.com">björn-jürgen.nußbaum@example.com</a>

Discussion

Our PR is restricted only on the case of German umlauts, but of course we see that a wider range of characters are affected for other languages, i.e. ñ or cases like é. We were not 100% certain how to proceed for this and would be happy for some advice. Is the approach we've taken in the code, namely extending the lookup for special characters, the right one here?

Maybe we're on the wrong track and the support for those characters should get added to rinkus rinku_isalnum() function. What do you think?