Look, all the non-Latin characters are represented as escape sequences. It is not a showstopper, since the rendered man page looks good, but every non-Latin character is represented with 5 bytes (in case of Greek), or 8 bytes (in case of Cyrillic and Armenian). If the characters are not escaped, they would occupy only 2 bytes each. It is just waste of space.
Modern groff allows using UTF-8 encoding in source files:
$ cat test.man
.\" Automatically generated by Pandoc 2.14.0.3
.\"
.TH "" "" "" "" ""
.hy
.SH Ελληνικά
.PP
српски հայերեն
$ groff -D utf8 -m man -T utf8 < test.man
() ()
Ελληνικά
српски հայերեն
()
Thus, I request the man writer outputs non-Latin character as-is, without converting them to escape sequences.
Pandoc version:
$ pandoc --version
pandoc 2.14.0.3
Compiled with pandoc-types 1.22.1, texmath 0.12.3.3, skylighting 0.10.5.2,
citeproc 0.4.0.1, ipynb 0.1.0.1
User data directory: /home/vdb/.local/share/pandoc
Copyright (C) 2006-2021 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
It is not the last available version. However, I scanned the pandoc release notes for releases after 2.14.0.3, it seems there were no changes in man writer.
BTW, in Fedora 37 man pages in languages with non-Latin writing systems do not use escape sequences. For example, Serbian:
$ cat /usr/share/man/sr/man1/cat.1.gz | gunzip | head -n20
.\" -*- coding: UTF-8 -*-
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5.
.\"*******************************************************************
.\"
.\" This file was generated with po4a. Translate the source file.
.\"
.\"*******************************************************************
.TH CAT 1 "Августа 2022" "ГНУ coreutils 9.1" "Корисничке наредбе"
.SH НАЗИВ
cat \- concatenate files and print on the standard output
.SH УВОД
\fBcat\fP [\fI\,ОПЦИЈА\/\fP]... [\fI\,ДАТОТЕКА\/\fP]...
.SH ОПИС
.\" Add any additional description here
.PP
Надовежите ДАТОТЕКУ(Е) на стандардни излаз.
.PP
Без ДАТОТЕКЕ, или када је ДАТОТЕКА \-, чита стандардни улаз.
.TP
\fB\-A\fP, \fB\-\-show\-all\fP
Or Japanese:
$ cat /usr/share/man/ja/man1/cat.1.gz | gunzip | head -n20
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.13.
.TH CAT "1" "2021年5月" "GNU coreutils" "ユーザーコマンド"
.SH 名前
cat \- ファイルの内容を連結して標準出力に出力する
.SH 書式
.B cat
[\fI\,オプション\/\fR]... [\fI\,ファイル\/\fR]...
.SH 説明
.\" Add any additional description here
.PP
ファイル (複数可) の内容を結合して標準出力に出力します。
.PP
ファイルの指定がない場合や FILE が \- の場合, 標準入力から読み込みを行います。
.HP
\fB\-A\fR, \fB\-\-show\-all\fR \fB\-vET\fR と同じ
.TP
\fB\-b\fR, \fB\-\-number\-nonblank\fR
空行以外に行番号を付ける。\-n より優先される
.HP
\fB\-e\fR \fB\-vE\fR と同じ
It used to be that UTF-8 in man pages was not reliably supported.
Perhaps that situation has changed and we can revisit this. In any case, we could keep the present behavior when the --ascii option is used.
Consider an example:
Source markdown file includes Greek, Cyrillic, and Armenian letters.
Pandoc converted markdown to man page, it is ok. However, let's have a look into .man file content:
Look, all the non-Latin characters are represented as escape sequences. It is not a showstopper, since the rendered man page looks good, but every non-Latin character is represented with 5 bytes (in case of Greek), or 8 bytes (in case of Cyrillic and Armenian). If the characters are not escaped, they would occupy only 2 bytes each. It is just waste of space.
Modern
groff
allows using UTF-8 encoding in source files:Thus, I request the man writer outputs non-Latin character as-is, without converting them to escape sequences.
Pandoc version:
It is not the last available version. However, I scanned the pandoc release notes for releases after 2.14.0.3, it seems there were no changes in man writer.
BTW, in Fedora 37 man pages in languages with non-Latin writing systems do not use escape sequences. For example, Serbian:
Or Japanese:
I am not aware about other distros, though.