vanderlee / phpSyllable

PHP Syllable splitter/counter and Hyphenator for text and HTML. Multi-language, customisable, cached and fast!
http://vanderlee.github.io/phpSyllable/
117 stars 34 forks source link

lowercase vs uppercase hyphenation word list #70

Closed mtox closed 1 year ago

mtox commented 1 year ago

Using the word list \hyphenation{...} in language files works only with words in upper or lower case, but not generalized (e.g. gegenstand does not match with Gegenstand). Maybe the script could just basically match in lowercase the hyphenation{..} word list?

Example: add some custom words in a language-file in a section \hyphenation{ German } to define that "German" should not be splitted.

So only "German" is matched and not split, but "german" is split by regular rules of the language file. Thus, redundant rules for "german" and "German" have to be inserted in language files in the end.

mtox commented 1 year ago

simple solution: in Syllable.php -> function parseWord -> starting Line 689 you can replace by

// Is it a pre-hyphenated word?
if (isset($this->hyphenation[mb_strtolower($word)])) {
    $rule_chars = mb_str_split($this->hyphenation[mb_strtolower($word)]); // Split the rules string into an array of characters

    $output = '';
    $pos = 0;
        //loop through the chars and apply rules defined by the hyphen - 
    foreach ($rule_chars as $char) {
      if ($char == '-') {
        $output .= '-';
      } else {
        $output .= mb_substr($word, $pos, 1);
        $pos++;
      }
    }
    return mb_split('-',$output);
}

in the language file (e.g. hyph-en.tex) only rules for syllables in lower case have to be defined; all variations - no matter if in lower case or upper case or any capitalization - are split up

\hyphenation{
german
deutsch-ame-ri-ka-ner
}
alexander-nitsche commented 1 year ago

I think this is a valid request because the patterns (given by command \patterns) are used case-insensitive too.