ousia / from-pandoc-to-context

Environment to parse XHTML from pandoc with ConTeXt
http://www.from-pandoc-to-context.tk
GNU General Public License v2.0
13 stars 3 forks source link

hyphenation issues #14

Open juh2 opened 8 years ago

juh2 commented 8 years ago

If I simply put a \hyphenation{...} somewhere in the style file, it does not work.

Is there a special location where to put hyphenation?

ousia commented 8 years ago

Not that I know. But I would recommend right after \sethyphenationfeatures.

The commands are new:

\registerhyphenationexception[MacOS]
\registerhyphenationexception[de][MacOS]
juh2 commented 8 years ago

Thanks! These commands seems to be undocumented. Why "MacOS"?

ousia commented 8 years ago

The hyphenation engine is a (somehow) new one.

The hyphenator was announced at the mailing list.

And I guess I found the command reading the source.

Please, feel free to improve the ConTeXt wiki :smirk:.

juh2 commented 8 years ago

Interesting link. In the thread you complain that the new hyphenator does not work with \hyphenation.

https://mailman.ntg.nl/pipermail/ntg-context/2014/080082.html

I tried the registerhyphenationexception with your MWE from the mailinglist but Nietz-sche is still incorrectly hyphenated.

ousia commented 8 years ago

It works with \registerhyphenationexception.

juh2 commented 8 years ago

Finally I understood. The example with MacOS disturbed me, because I thought it refers to ConTeXt on Mac OS X. I added a very incomplete new page in the wiki.

juh2 commented 8 years ago

I am afraid that it does not work in fptc:

Maybe you can try to compile a small text with the word "Wirtschaftsoligarchie"

It is hyphenated after "Wirtschaft".

The exception \registerhyphenationexception[Wirt-schafts-o-li-gar-chie]

does not work here.

ousia commented 8 years ago

I tried the following (after updating pandoc-xhtml):

<div lang="de">
<span class="hyphenatedword">Wirtschaftsoligarchie</span>
</div>

And it is hyphenated Wirt-schaft-so-lig-ar-chie.

From what you are describing, the word isn’t tagged as German. See the difference after compiling this sample with ConTeXt:

\starttext
\hyphenatedword{Wirtschaftsoligarchie}
\de\hyphenatedword{Wirtschaftsoligarchie}
\stoptext

Now I can only guess: did you set the proper lang value in the metadata?

If i’m missing something, please explain with a minimal sample.

BTW, didn’t you have another issues?

juh2 commented 8 years ago

I have

---
locale: de-DE
lang: de-DE
---

in my YAML-Codeblock. I hope this is enough to set mainlanguage in the process.

ousia commented 8 years ago

I guess lang: de-DE should be enough.

But anyway, what do you get from this as a single document?

---
title: Deutsche Silbentrennung
lang: de-DE
...

<span class="hyphenatedword">Wirtschaftsoligarchie</span>

Did you get the right hyphenation when adding the Markdown snippet to your document?

Please, update pandoc-xhtml.tex. Otherwise, the class would be useless.

juh2 commented 8 years ago

No, it does not work.

I put

\registerhyphenationexception[Wirt-schafts-o-li-gar-chie]

after the setuphyphenation-commands in my style files.

The result is: Wirt-schaft-so-lig-ar-chie

It does not work with lang: de-DE and/or locale: de-DE

ousia commented 8 years ago

It works fine here. I’m afraid the word is hyphenated with English hyphenation rules in your system.

BTW, do you have the following line in your copy of pandoc-xhtml.tex?

\installlanguage [de-DE] [de-de]

Add it, if not. And replace in the same file:

\startxmlsetups xml:hyphenatedword
    \hyphenatedword{\xmlflush{#1}}
\stopxmlsetups

with

\startxmlsetups xml:hyphenatedword
    \hyphenatedword{\xmlflush{#1}} \currentdate
\stopxmlsetups

and tell me what happens. You should get a German date, not an English one.

juh2 commented 8 years ago

Wirt-schaft-so-lig-ar-chie 11. Februar 2016

ousia commented 8 years ago

What happens when you add in the book you’re composing (I mean, one of your production documents) the following?

<span class="hyphenatedword">Wirtschaftsoligarchie</span>

But comment the hyphenation exception before.

juh2 commented 8 years ago

I added some more samples:

Wirt-schaft-so-lig-ar-chie 12. Februar 2016 Haus-macht 12. Februar 2016 Gesell-schafts-wis-sen-schaf-ten 12. Februar 2016

With \registerhyphenationexception I could not put a hyphen at Ge-sell-schafts...

Somehow the whole mechanism is not working here:


context --version

resolvers       | trees | analyzing 'home:texmf'
mtx-context     | ConTeXt Process Management 0.63
mtx-context     |
mtx-context     | main context file: /home/juh/context/tex/texmf-context/tex/context/base/mkiv/context.mkiv
mtx-context     | current version: 2016.02.06 14:06
luajittex --version
This is LuajitTeX, Version beta-0.89.0 (TeX Live 2016/dev)
ousia commented 8 years ago

Leaving the issue with Ge-schell-schaft aside (it isn’t a hyphenation exception), does hyphenation work fine now in German?

TeX defines a minimum number of letters on the left and on the right to hyphenate words (lefthyphenmin and righthyphenmin). In Spanish, the default values are 2 and 2. In German, default values are 3 and 3.

Because of that, you can get co-mo in Spanish (which is wrong) and Methode in German.

You could define a minimum length to apply hyphenation (to prevent co-mo in Spanish or ei-ne in German). The problem is more complex in German (see this thread.)

The way to set the minimum characters is:

\setuplanguage[de-de][lefthyphenmin=2, righthyphenmin=2]

And I’m now in a hurry. I will give the other parameter later this evening (I don’t have time now to search it).

ousia commented 8 years ago

The way to define a minimum word length to apply hyphenation in ConTeXt is:

\definehyphenationfeatures
       [givemefive]
       [hyphenmin=4]

\sethyphenationfeatures
       [givemefive]

Some questions that you should take into consideration:

I hope this is clear now. Let me know, if it isn’t or if hyphenation doesn’t work as expected in your documents.

ousia commented 8 years ago

@juh2, I don’t know whether the issue was solved. (I guess it is, but I really don’t know.)

I suspect that the whole problem with wrong hyphenation is that the \installlanguage command was missing from your copy of pandoc-xhtml.tex.

If all problems with hyphenation commented in this issue are solved, please close it.

I’m not closing it myself, so you may reopen it when needed.