unicode link label normalization (fix test 539)

Input:

[ẞ]

[SS]: /url

This is the result in master:

File "tests/spec-539.html", line 1, characters 0-0:
diff --git a/_build/default/tests/spec-539.html b/_build/default/tests/spec-539.html.new
index c31102f..4e28a6f 100644
--- a/_build/default/tests/spec-539.html
+++ b/_build/default/tests/spec-539.html.new
@@ -1 +1 @@
-<p><a href="/url">ẞ</a></p>
+<p>[ẞ]</p>
make: *** [test] Error 1

The issue is that both labels are not being matched, hence is it not recognized as a link. To match labels, we need to normalize them (strip off leading/trailing whitespace, ...) and do a case-insensitive comparison. The unicode version of that is a bit more complex as we need to do a Unicode case folding. From the spec:

One label matches another just in case their normalized forms are equal. To normalize a label, strip off the opening and closing brackets, perform the Unicode case fold, strip leading and trailing spaces, tabs, and line endings, and collapse consecutive internal spaces, tabs, and line endings to a single space.

This PR adapts the normalize function to work with unicode labels too. Fortunately, I could rely on some libs (uutf, uucp, and uunf) and I even found a piece of code in the doc that does almost what's needed.

With that adapted normalize function, ẞ and SS are matched. The result is now a link as expected.

ocaml / omd

unicode link label normalization (fix test 539) #277