mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.97k stars 715 forks source link

How to remove the format charaters in the man buffer [QUESTION] #5212

Open QiBaobin opened 3 months ago

QiBaobin commented 3 months ago

Question

Below content was shown when I run man man, it works in the shell or vim, anything did I miss?

4mMAN24m(1)                                                                                                  Manual pager utils                                                                                                  4m

1mNAME0m
       man - an interface to the system reference manuals

1mSYNOPSIS0m
       1mman 22m[4mman24m 4moptions24m] [[4msection24m] 4mpage24m ...] ...
       1mman -k 22m[4mapropos24m 4moptions24m] 4mregexp24m ...
       1mman -K 22m[4mman24m 4moptions24m] [4msection24m] 4mterm24m ...
       1mman -f 22m[4mwhatis24m 4moptions24m] 4mpage24m ...
       1mman -l 22m[4mman24m 4moptions24m] 4mfile24m ...
       1mman -w22m|1m-W 22m[4mman24m 4moptions24m] 4mpage24m ...

1mDESCRIPTION0m
       1mman  22mis  the  system's manual pager.  Each 4mpage24m argument given to 1mman 22mis normally the name of a program, utility or function.  The 4mmanual24m 4mpage24m associated with each of these arguments is then foun
       4mtion24m, if provided, will direct 1mman 22mto look only in that 4msection24m of the manual.  The default action is to search in all of the available 4msections24m following a pre-defined order (see 1mDEFAULTS22m), and
       4mpage24m found, even if 4mpage24m exists in several 4msections24m.

       The table below shows the 4msection24m numbers of the manual followed by the types of pages they contain.

       1   Executable programs or shell commands
       2   System calls (functions provided by the kernel)
       3   Library calls (functions within program libraries)
       4   Special files (usually found in 4m/dev24m)
       5   File formats and conventions, e.g. 4m/etc/passwd0m
       6   Games
       7   Miscellaneous (including macro packages and conventions), e.g. 1mman22m(7), 1mgroff22m(7), 1mman-pages22m(7)
       8   System administration commands (usually only for root)
       9   Kernel routines [Non standard]

       A manual 4mpage24m consists of several sections.

       Conventional section names include 1mNAME22m, 1mSYNOPSIS22m, 1mCONFIGURATION22m, 1mDESCRIPTION22m, 1mOPTIONS22m, 1mEXIT STATUS22m, 1mRETURN VALUE22m, 1mERRORS22m, 1mENVIRONMENT22m, 1mFILES22m, 1mVERSIONS22m, 1mSTANDARDS2

       The following conventions apply to the 1mSYNOPSIS 22msection and can be used as a guide in other sections.

       1mbold text          22mtype exactly as shown.
       4mitalic24m 4mtext24m        replace with appropriate argument.
       [1m-abc22m]             any or all arguments within [ ] are optional.
       1m-a22m|1m-b              22moptions delimited by | cannot be used together.
       4margument24m ...       4margument24m is repeatable.
       [4mexpression24m] ...   entire 4mexpression24m within [ ] is repeatable.

       Exact rendering may vary depending on the output device.  For instance, man will usually not be able to render italics when running in a terminal, and will typically use underlined or coloured text instead.

       The command or function illustration is a pattern that should match all possible invocations.  In some cases it is advisable to illustrate several exclusive invocations as is shown in the 1mSYNOPSIS 22msection of this ma
       page.

1mEXAMPLES0m
       1mman 4m22mls0m
           Display the manual page for the 4mitem24m (program) 4mls24m.
Screwtapello commented 3 months ago

What version of man are you using (man --version might give a hint)?

Do you have any custom man con figuration, like a $MANOPT environment variable?

Traditionally man formats its output for an actual form-feed printer, and it's up to a tool like less (or Kakoune, or classics like ul and colcrt) to convert that into the formatting codes used by your terminal. It looks like your man is emitting terminal formatting codes directly, and causing Kakoune to get confused.

QiBaobin commented 3 months ago

2.12.1, link is here : https://github.com/NixOS/nixpkgs/blob/d3f42bd62aa840084563e3b93e4eab73cb0a0448/pkgs/tools/misc/man-db/default.nix#L18

no any related environment variable.

I tried use embed man in the mac osx, /usr/bin/man, same behavior.

Screwtapello commented 3 months ago

On my machine (using man-db 2.12.1 on Debian Testing), if I pipe the output of man to a file (the same way Kakoune does), it's printed without formatting:

$ man man > /tmp/man-plain.txt
$ hexdump -C /tmp/man-plain.txt | head
00000000  4d 41 4e 28 31 29 20 20  20 20 20 20 20 20 20 20  |MAN(1)          |
00000010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000060  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 4d  |               M|
00000070  61 6e 75 61 6c 20 70 61  67 65 72 20 75 74 69 6c  |anual pager util|
00000080  73 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |s               |
00000090  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000e0  20 20 20 20 20 20 20 20  20 4d 41 4e 28 31 29 0a  |         MAN(1).|
000000f0  0a 4e 41 4d 45 0a 20 20  20 20 20 20 20 6d 61 6e  |.NAME.       man|

...but if I force it to produce "ascii" output, it generates escape sequences like you're seeing:

$ man -Tascii man > /tmp/man2.txt
$ hexdump -C /tmp/man2.txt  | head
00000000  1b 5b 34 6d 4d 41 4e 1b  5b 32 34 6d 28 31 29 20  |.[4mMAN.[24m(1) |
00000010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000020  20 20 20 20 20 20 20 4d  61 6e 75 61 6c 20 70 61  |       Manual pa|
00000030  67 65 72 20 75 74 69 6c  73 20 20 20 20 20 20 20  |ger utils       |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000050  20 1b 5b 34 6d 4d 41 4e  1b 5b 32 34 6d 28 31 29  | .[4mMAN.[24m(1)|
00000060  0a 0a 1b 5b 31 6d 4e 41  4d 45 1b 5b 30 6d 0a 20  |...[1mNAME.[0m. |
00000070  20 20 20 20 20 20 6d 61  6e 20 2d 20 61 6e 20 69  |      man - an i|
00000080  6e 74 65 72 66 61 63 65  20 74 6f 20 74 68 65 20  |nterface to the |
00000090  73 79 73 74 65 6d 20 72  65 66 65 72 65 6e 63 65  |system reference|

I suspect something about your configuration is forcing it or tricking it into producing formatted output, but I have no idea what.

PJungkamp commented 1 week ago

It seems like debian (at least on stable releases) disables the SGR escape sequences for TTYs in /etc/groff/man.local.

This feature seems to be enabled by default with groff 1.23 and causes man itself to emit escape sequences even without a pager like less. See https://lists.gnu.org/archive/html/groff/2023-07/msg00051.html.

Setting GROFF_NO_SGR=1 seems to fix the issue. I think we should consider updating the man invocation.

PJungkamp commented 1 week ago

It's a problem in nixpkgs....

https://github.com/NixOS/nixpkgs/issues/351845