latex3 / fontspec

Font selection in LaTeX for XeTeX and LuaTeX
http://latex3.github.io/fontspec/
LaTeX Project Public License v1.3c
277 stars 34 forks source link

Hyphenation bug, maybe #214

Closed Lenchik closed 8 years ago

Lenchik commented 9 years ago

I submitted issue to polyglossia repository with example. But later, in discussion, came the idea that it is fontspec that causing it. Polyglossia is loading fontspec. Unfortunately, my knowledge of latex is not enough to narrow it down to fontspec only (for me it happens only if russian language called by polyglossia is typed). Can anyone help to find out and show up if bug is in polyglossia only, and fontspec has nothing to do with it?

Lenchik commented 8 years ago

I've done some examples.

\documentclass[a4paper]{report}
\usepackage{fontspec}

%Don't know how to turn on russian language hyphenation, so i just put next line
\hyphenation{Пред-ло-же-ние пе-ре-но-сом сло-ва тес-та е-ле}

\begin{document}
    \def\text{\fbox{\parbox{1.55cm}{%
    EXAMPLE HYPHENATION%
    }}\qquad\qquad\null\par\bigskip}

    \fontspec{Linux Libertine O}
    \text
    \addfontfeature{HyphenChar=None}
    \text
    \addfontfeature{HyphenChar={+}}
    \text
    \addfontfeature{HyphenChar={"002D}}
    \text
    \addfontfeature{HyphenChar={"00AD}}
    \text

    \def\textrus{\fbox{\parbox{1.55cm}{%
    Предложение с~переносом слова-теста еле-еле.%
    }}\qquad\qquad\null\par\bigskip}

    \fontspec{Linux Libertine O}
    \textrus
    \addfontfeature{HyphenChar=None}
    \textrus
    \addfontfeature{HyphenChar={+}}
    \textrus
    \addfontfeature{HyphenChar={"002D}}
    \textrus
    \addfontfeature{HyphenChar={"00AD}}
    \textrus

\end{document}

Outputs, logs, and text copypasted from pdf attached. Notice:

  1. All russian text didn't change hyphenation sign. English one did. :(
  2. In copypasted text almost all english text suppressed hyphens inserted, russian didn't. :(

I am supposing russian copypasted text should look like

Предложение
с переносом
слова-теста
еле-еле.

or even better - in one line.

lualatex-example3.log.txt

lualatex-example3.pdf

lualatex-example3-copypaste.txt

xelatex-example3.log.txt

xelatex-example3.pdf

xelatex-example3-copypaste.txt

wspr commented 8 years ago

Ah, you've discovered an edge case :). fontspec is hiding the fact that it's making a global change to the TeX font when you use the HyphenChar feature. This means that when you use it in \addfontfeatures, it actually changes it for the rest of the document. I'll add some code to prevent it being used in this way.

Lenchik commented 8 years ago

You'll~ need~ to~ define~ multiple~ font~ families~ to~ achieve~ what~ you~ want Can you please provide some example to control/adjust hyphenation (symbol) in preamble?

Lenchik commented 8 years ago
\documentclass[a4paper]{report}
\usepackage{fontspec}

%Don't know how to turn on russian language hyphenation, so i just put next line
\hyphenation{Пред-ло-же-ние пе-ре-но-сом сло-ва тес-та е-ле}

\defaultfontfeatures{HyphenChar={"002D}}
\setmainfont{Linux Libertine O}

\begin{document}
    \def\text{\fbox{\parbox{1.55cm}{%
    EXAMPLE HYPHENATION%
    }}\qquad\qquad\null\par\bigskip}

    \text

    \def\textrus{\fbox{\parbox{1.55cm}{%
    Предложение с~переносом слова-теста еле-еле.%
    }}\qquad\qquad\null\par\bigskip}

    \textrus

\end{document}

Why copypasted english text suppressed hyphens inserted (russian text didn't)?

% !TeX program = xelatex
\documentclass[a4paper]{report}
\usepackage{fontspec}

%Don't know how to turn on russian language hyphenation, so i just put next line
\hyphenation{Пред-ло-же-ние пе-ре-но-сом сло-ва тес-та е-ле}

\defaultfontfeatures{HyphenChar={"00AD}}
\setmainfont{Linux Libertine O}

\begin{document}
    \def\text{\fbox{\parbox{1.55cm}{%
    EXAMPLE HYPHENATION%
    }}\qquad\qquad\null\par\bigskip}

    \text

    \def\textrus{\fbox{\parbox{1.55cm}{%
    Предложение с~переносом слова-теста еле-еле.%
    }}\qquad\qquad\null\par\bigskip}

    \textrus

\end{document}

Why compiled in xelatex this example shows no hyphen symbol in pdf (and in copied text from pdf)? For lualatex this example works ok.

wspr commented 8 years ago

TeX and XeTeX have a bug/feature that the first word of a paragraph is not hyphenated. LuaTeX fixes this bug, which is why you get the different results. I’m not sure what you mean about the “soft hyphen” example (“00AD) — in both cases I get hyphenation without any hyphen (as expected, because the soft hyphen is invisible).

Lenchik commented 8 years ago

“soft hyphen” example (“00AD) xelatex made PDF opened with STDU viewer, text selected and copied in buffer, then copied here:

EXAMPLE
HYPHEN 
ATION
Предложение
с пе 
реносом
слова-теста
еле-еле

xelatex made PDF opened with Adobe Reader, text selected and copied in buffer, then copied here:

EXAMPLE
HYPHEN
ATION
Предложение
с пе
реносом
слова-теста
еле-еле.

May be both examples are working as expected, but here comes lualatex :smile: By the way, i am not seeing hyphenation symbol in PDF in both viewers.

lualatex made PDF opened with STDU viewer, text selected and copied in buffer, then copied here:

EXAMPLE
HYPHEN­
ATION
Предло­
жение
с пере­
носом
слова-
теста
еле-еле.

Depending on font or some buffer viewer i am seeing actual soft hyphen at the end of hyphenated lines (symbol 00AD). Can be seen in Windows Notepad too. You can probably copy this from web page and insert in your favourite text editor to see symbols or symbol codes. Maybe this is some "pdf viewer bug" because of next.

lualatex made PDF opened with Adobe Reader, text selected and copied in buffer, then copied here:

EXAMPLE
HYPHENATION
Предложение
с переносом
слова-
теста
еле-еле.

And this is what i was expecting in whole 4 examples here. And this time i am seeing hyphenation symbol in PDF in both viewers.

The whole idea behind my hyphenchar manipulations is making pdf more copypastable/searchable. Way of copypasting soft hyphen symbol maybe viewer dependent, but at least it is supposed to be shown.

wspr commented 8 years ago

Apologies that I didn’t understand the intention originally. The behaviour of what the engine does with the soft hyphen is beyond fontspec’s control, I’m afraid! As far as I can tell, something is happening correctly, but as you have seen there’s very different behaviour depending on the engine and other programmes used. You might want to query on the LuaTeX mailing list.

Lenchik commented 8 years ago

So this is something like 00AD sent to xelatex and then is just missing/lost?

I think that lualatex works fine enough for this case.

Lenchik commented 8 years ago

@davidcarlisle Can you please create plain tex version of “soft hyphen” example from post https://github.com/wspr/fontspec/issues/214#issuecomment-174184943 but without fontspec package (if it is possible) to be compiled with xetex or luatex. This is for testing of engines different behaviour with hyphenchars. Here goes latex+fontspec example from https://github.com/wspr/fontspec/issues/214#issuecomment-174184943

\documentclass[a4paper]{report}
\usepackage{fontspec}

%Don't know how to turn on russian language hyphenation, so i just put next line
\hyphenation{Пред-ло-же-ние пе-ре-но-сом сло-ва тес-та е-ле}

\defaultfontfeatures{HyphenChar={"00AD}}
\setmainfont{Linux Libertine O}

\begin{document}
    \def\text{\fbox{\parbox{1.55cm}{%
    EXAMPLE HYPHENATION%
    }}\qquad\qquad\null\par\bigskip}

    \text

    \def\textrus{\fbox{\parbox{1.55cm}{%
    Предложение с~переносом слова-теста еле-еле.%
    }}\qquad\qquad\null\par\bigskip}

    \textrus

\end{document}

You did great shrinkability example before (in https://github.com/wspr/fontspec/issues/139#issuecomment-174298868), that is why i am asking.

davidcarlisle commented 8 years ago

On 29 January 2016 at 18:42, Lenchik notifications@github.com wrote:

@davidcarlisle https://github.com/davidcarlisle Can you please create plain tex version of “soft hyphen” example

possibly, but probably not tonight:-)