heiglandreas / Org_Heigl_Hyphenator

Provide TeX-Hyphenation to PHP
http://orgheiglhyphenator.readthedocs.org
MIT License
54 stars 14 forks source link

Cannot hyphenate a single word #31

Open daberlin opened 7 years ago

daberlin commented 7 years ago

Expected: "Ot^to" Actual: "Otto"

use \Org\Heigl\Hyphenator as Hy; $hy = new Hy\Hyphenator(); $op = new Hy\Options(); $op->setHyphen('^') ->setDefaultLocale('de_DE') ->setRightMin(2) ->setLeftMin(2) ->setWordMin(4) ->setFilters('Simple') ->setTokenizers('Punctuation','Whitespace') ; $hy->setOptions($op); var_dump($hy->hyphenate('Otto'));

heiglandreas commented 7 years ago

OK. I know why it happens and it looks like I need to change that…

To fix the issue add a space after the name and hyphenate "Otto " instead of "Otto".

The background is, that the lib was originally intended to hyphenate a complete text at every possible position using the "SimpleFilter" (like "Sau-er-stoff") or to hyphenate a single word using the "NonDefaultFilter" and return all possible options containing one hyphen like ["Sau-erstoff", "Sauer-stoff"]. You want to combine those two and that fails. By adding a space the hyphenator sees this as a complete text and returns the appropriately hyphenated stuff.

jdreesen commented 5 years ago

I agree that this should be changed. It's rather confusing that the Hyphenator::hyphenate() method sometimes returns an array with hyphenation options instead of a hyphenated string.

Note that this only occurs when you don't use the default filter config, because it adds two filters (one being a basically a no-op (CustomMarkupFilter), that acts only as a prevention for the array return case, I think).

If you just use the SimpleFilter, like @daberlin, and try to hyphenate a single word, you'll always get an array with a single item that matches your input as a result. This is because it's initialized in the constructor (of Token) like this and never changed (because the SimpleFilter only calls Token::setFilteredContent() with the hyphenated string, which is ignored by Hyphenator::hyphenate() in this case.

So to get the hyphenation options for a single word you must use the NonStandardFilter only, which will call Token::setHyphenatedContent() and sets the array with the options.