Closed Ciantic closed 6 years ago
Thanks for your good suggestion! I will add this feature to Python, Java and JavaScript APIs within the next week. (It would be good in .Net API as well but that one is already out of date in other significant ways.)
Actually this will not be enough for proper hyphenation on web pages. It will lead to word "vaa'an" to be replaced with "vaa­an" which in turn will be shown as "vaaan" if there is no line break. This is not what most people would expect. We will need additional boolean parameter to specify whether hyphenation points that lead to context changes should be included or excluded. At this point it is probably best to move the logic to libvoikko core. In fact this complication was one of the reasons why it was not there in the first place. Client code (such as LibreOffice) has hyphenation API that was better served with what is now getHyphenationPattern in our JavaScript API.
I actually ended up using a getHyphenationPattern like this:
let w = "testi";
let pattern = v.getHyphenationPattern(w);
let j = 0;
let newWord = "";
for (const char of pattern) {
if (char === "-") {
newWord += "";
}
newWord += w[j];
j++;
}
console.log(newWord);
It works for now.
Notice that newWord += ""
has the shy character inside it, it's just not visible in the GitHub.
Not sure will my approach work with "vaa'an" word though. I think it's rather rare, if I understood correctly it's hyphenated "vaa-an" but when it does not have hyphen it has extra char "vaa'an". That logic is not doable with ­
and HTML, and the hyphenated form for it is "vaa'-an", but it probably does not matter.
@Ciantic your implementation that uses getHyphenationPattern seems correct to me. With the latest changes you can do the same with the following, shorter piece of code:
let w = "testi"; let newWord = v.hyphenate(w, "­", false); console.log(newWord);
Works with C, C++, JavaScript, Java and Python.
I would like to set string
­
as a hyphenation character, right now thehyphenate
function inserts regular-
but this is not wanted character e.g. when rendering HTML.­
is supported by most browsers, this is probably mostly needed on JS side of the library.