latex3 / hyperref

Hypertext support for LaTeX
158 stars 33 forks source link

UTF8 in TextField's 'default' #49

Open madchemiker opened 6 years ago

madchemiker commented 6 years ago

Like https://github.com/ho-tex/hyperref/issues/5, LuaLaTeX produces an incorrect result if you use non-ASCII characters for 'default' in TextField. It seems pdfLaTeX produces the correct result if you use only LATIN1-characters, but this problem occurs on pdfLaTeX too, if you use Japanese characters for 'default'. The following hack fixes this problem. (dank des Kommentars von Frau Fischer in https://github.com/ho-tex/hyperref/issues/5)

\documentclass{article}

\usepackage{fontspec}       % ---- for LuaLaTeX
% \usepackage[utf8]{inputenc}   % ---- for pdfLaTeX

\usepackage[unicode=true]{hyperref}
\usepackage[T1]{fontenc}

% FIX for 'default'
% See also the fix for 'value' by Frau Fischer
% ( https://github.com/ho-tex/hyperref/issues/5 ).
% I just replaced 'value' with 'default'.
\makeatletter
\define@key{Field}{default}{%
  \Hy@pdfstringdef\Fld@default{#1}}
\makeatother

% NOTE:
%   This fix IS needed for pdfLaTeX too, when you use Japanese
%   characters.
%
% \documentclass{article}
% \usepackage[whole]{bxcjkjatype}
% \usepackage[utf8]{inputenc}
% \begin{document}
% \begin{Form}
%   % without fix, fails to compile
%   \TextField[name=addr,default=東京]{Address} %Tokyo
% \end{Form}
% \end{document}

\begin{document}
\begin{Form}
  % OK, of course
  \TextField[name=textfield1]{Address} \\

  % OK
  \TextField[name=textfield2,value=Köln]{Address} \\

  % correct on pdfLaTeX (but incorrect if you use Japanese characters)
  % incorrect without FIX on LuaLaTeX
  \TextField[name=textfield3,default=München]{Address} \\

  % (though this is probably meaningless)
  % incorrect (internally) on LuaLaTeX
  % 
  % $ pdftk pr-textfield-default-encoding.pdf dump_data_fields
  % ...
  % FieldValue: Köln           % This is OK, but
  % FieldValueDefault: München   % Quatsch!
  % ...
  \TextField[name=textfield4,value=Köln,default=München]{Address}
\end{Form}
\end{document}
madchemiker commented 6 years ago

Sorry, I have noticed that this fix must be applied ONLY to TextField. This fix causes an improper behavior for ChoiceMenu (both on pdfLaTeX and LuaLaTeX).

\begin{Form}
  \ChoiceMenu[radio,name=choice,default=Yes]{TeX User}{Yes,No}
\end{Form}

with "FIX" produces:

$ pdftk pr2.pdf dump_data_fields
---
FieldType: Button
FieldName: choice
FieldFlags: 49152
FieldValue: \376\377\000Y\000e\000s     # should be "Yes"
FieldJustification: Left
FieldStateOption: Yes
u-fischer commented 6 years ago

The fix for the default field is certainly needed. But I don't see a problem with the choice menu. With the option unicode you are forcing everything into UTF16BE, and so Yes is encoded as \376\377\000Y\000e\000s. If you don't like this try \usepackage[pdfencoding=auto]{hyperref} instead.

madchemiker commented 6 years ago

Thank you for your reply. I thought the FIX should not be applyed for ChoiceMenu, because the following code with FIX does not work as expected.

\documentclass{article}

\usepackage{fontspec}       % ---- for LuaLaTeX
% \usepackage[utf8]{inputenc}   % ---- for PDFLaTeX

\usepackage[unicode=true]{hyperref}
\usepackage[T1]{fontenc}

\begin{document}
\begin{Form}
  % 'Yes' is checked (as expected)
  \ChoiceMenu[radio,name=nofix,default=Yes]{TeX User?}{Yes,No}

  \makeatletter
  \define@key{Field}{default}{%
    \Hy@pdfstringdef\Fld@default{#1}}
  \makeatother

  % 'Yes' is NOT checked
  \ChoiceMenu[radio,name=withfix,default=Yes]{TeX User?}{Yes,No}
\end{Form}
\end{document}

But this is caused probably by the inconsistency of Charset (encoding) for FieldValue and FieldStateOption.

So I think I should say now: not only FieldValue but also FieldStateOption should be encoded as UTF16 for ChoiceMenu.

FYI: The results of pdftk.

1) the PDF file which is generated by LuaLaTeX

$ pdftk choice.pdf dump_data_fields

FieldType: Button
FieldName: nofix
FieldFlags: 49152
FieldValue: Yes
FieldJustification: Left
FieldStateOption: Yes

FieldType: Button
FieldName: withfix
FieldFlags: 49152
FieldValue: \376\377\000Y\000e\000s
FieldJustification: Left
FieldStateOption: Yes

2) After the PDF file is edited with Acrobat Reader DC (checked both "NO"-fields)

$ pdftk choice.pdf dump_data_fields

FieldType: Button
FieldName: nofix
FieldFlags: 49152
FieldValue: No  
FieldValue: Yes
FieldJustification: Left
FieldStateOption: No
FieldStateOption: Off
FieldStateOption: Yes

FieldType: Button
FieldName: withfix
FieldFlags: 49152
FieldValue: No
FieldValue: \376\377\000Y\000e\000s
FieldJustification: Left
FieldStateOption: No
FieldStateOption: Off
FieldStateOption: Yes

I don't know why the FieldValue is duplicated...

u-fischer commented 6 years ago

I see what you mean. I will look at it but not today.

u-fischer commented 4 years ago

I think it will in the next version work for umlauts and other chars in T1-encoding, but not japanese - this would imho need extended changes in the font resources.