texjporg / platex

pLaTeX community edition
BSD 3-Clause "New" or "Revised" License
49 stars 8 forks source link

Inconsistent error message #84

Closed JulienPalard closed 2 years ago

JulienPalard commented 5 years ago

Using:

platex --version
e-pTeX 3.14159265-p3.8.1-180901-2.6 (utf8.euc) (TeX Live 2019/dev/Debian)
kpathsea version 6.3.1/dev
ptexenc version 1.3.7/dev

(from Debian Buster)

with the following test.tex file:

\documentclass[a4,10pt,dvipdfmx]{article}
\title{Pouette}
\begin{document}
ſ
\end{document}

(sha1 starting with b2fa881)

Running: platex -kanji=utf8 -recorder test.tex gives me:

! Package inputenc Error: Unicode character 顛 (U+C4CF)
(inputenc)                not set up for use with LaTeX.

Which I doubly don't understand:

is NOT U+C4CF:

U+985B CJK UNIFIED IDEOGRAPH-985B
UTF-8: e9 a1 9b UTF-16BE: 985b Decimal: 顛 Octal: \0114133
顛

and

U+C4CF HANGUL SYLLABLE SSWIC
UTF-8: ec 93 8f UTF-16BE: c4cf Decimal: 쓏 Octal: \0142317
쓏

And both characters are not in my file, my file contains only ASCII and ſ (U+017F LATIN SMALL LETTER LONG S).

So, what did I get wrong? For some context, I'm trying to build the cpython re module japanese translation, which contains both japanese characters AND ſ in an example.

It works with:

e-pTeX 3.14159265-p3.7.1-161114-2.6 (utf8.euc) (TeX Live 2017/Debian)
kpathsea version 6.2.3
ptexenc version 1.3.5

from Ubuntu bionic though.

aminophen commented 5 years ago

Thanks for your report. Confirmed on my own build of TeX Live 2020/dev from r51250 on darwin. I guess this is a problem in pTeX, not in pLaTeX.

Consider the following plain pTeX source (test.tex):

\message{ſ}\x
ſ\bye

Compiling this source shows on the terminal:

$ ptex test
This is pTeX, Version 3.14159265-p3.8.2 (utf8.euc) (TeX Live 2020/dev) (preloaded format=ptex)
 restricted \write18 enabled.
(./test.tex 顛
! Undefined control sequence.
l.1 \message{^^c5^^bf}\x

? x

The first "ſ" is converted to "顛", and the second "ſ" is converted to "^^c5^^bf". Internally pTeX (more precisely, the built-in library named "ptexenc") converts UTF-8 inputs to EUC-JP or Shift-JIS, so there might be some problem in that conversion.

Anyway, we will discuss the problem in texjporg/tex-jp-build#80, which provides the upstream source of pTeX and ptexenc.

aminophen commented 2 years ago

See https://github.com/texjporg/tex-jp-build/issues/81