rburns / ansi-to-html

Convert ansi escaped text streams to html.
MIT License
354 stars 48 forks source link

ansi-to-html incorrectly interpret tput output to *(B characters #25

Open ghost opened 8 years ago

ghost commented 8 years ago

Hi, thanks a lot for the great job on ansi-to-html!

I'm a Drone CI user, Drone CI use ansi-to-html to render console output. It works almost fine, while tput doesn't work correctly, I reported a bug to downstream, and then notice ansi-to-html is the right upstream: https://github.com/drone/drone/issues/1491

To reproduce: $ tput sgr0 | ansi2html

Expected result: No text in <body></body> Actual result: There is a (B which is unexpected:

 48 <body>
 49 <pre>(B</pre>
 50 </body>

I rebuilt latest ansi-to-html and confirmed it can be reproduced at 7d34444cb45eb53253b2e119a36c95ccf4410684

Could you have a look? Thanks a lot!

rburns commented 8 years ago

Thanks for the bug report. What ansi escape sequence does sgr0 output? It seems we should probably just omit it and other cursor related sequences from the output stream.

ghost commented 8 years ago

I run $ tput sgr0 > out.txt, and out.txt shows below content in vim:

^[(B^[[m

Which should be read as escape, (, B, escape, [, m

Here is a hexdump output:

$ hexdump out.txt 
0000000 281b 1b42 6d5b                         
0000006

Thank you!

ghost commented 8 years ago

BTW, it would be great to also test other tput outputs:

Here is the script we use in pacman:

colorize() {
    # prefer terminal safe colored and bold text when tput is supported
    if tput setaf 0 &>/dev/null; then
        ALL_OFF="$(tput sgr0)"
        BOLD="$(tput bold)"
        BLUE="${BOLD}$(tput setaf 4)"
        GREEN="${BOLD}$(tput setaf 2)"
        RED="${BOLD}$(tput setaf 1)"
        YELLOW="${BOLD}$(tput setaf 3)"
    else
        ALL_OFF="\e[0m"
        BOLD="\e[1m"
        BLUE="${BOLD}\e[34m"
        GREEN="${BOLD}\e[32m"
        RED="${BOLD}\e[31m"
        YELLOW="${BOLD}\e[33m"
    fi
    readonly ALL_OFF BOLD BLUE GREEN RED YELLOW
}

From my previous tests, only tput sgr0 shows unexpected characters. I can test again for more once this issues is solved.

ghost commented 8 years ago

Hi @rburns , any updates? Thanks!

rburns commented 8 years ago

After looking a bit closer, I think this is reported on the wrong repo. It's possibly this one: https://github.com/ralphbean/ansi2html based on the command line and the output being wrapped in a body tag.

The other thing I found was that tput sgr0 output is different on my terminal, compared to what you described.

0000000 5b1b 0f6d                              
0000004

But, I think this package may have this bug as well. If you run the following code over here https://tonicdev.com/npm/ansi-to-html you get similar results.

var Convert = require("ansi-to-html")
var convert = new Convert();

console.log(convert.toHtml('<body>\x1b(B\x1b[m</body>'));
 "<body>(B</body>"
rburns commented 8 years ago

Or maybe the ^[(B is erroneous. I can't find reference to it anywhere.

ghost commented 8 years ago

Hi @rburns, I also test in the way the same to https://github.com/rburns/ansi-to-html/issues/25#issuecomment-198994826 and I have same result to you.

tput is part of ncurses, I'll have a look and see if there is anything interesting in ncurses source code then report back.

ghost commented 8 years ago

Hi, in ncurses/progs/infocmp.c I found these:

/* this group is specified by ISO 2022 */
    {"\033(0", "ISO DEC G0"},   /* enable DEC graphics for G0 */
    {"\033(A", "ISO UK G0"},    /* enable UK chars for G0 */
    {"\033(B", "ISO US G0"},    /* enable US chars for G0 */
    {"\033)0", "ISO DEC G1"},   /* enable DEC graphics for G1 */
    {"\033)A", "ISO UK G1"},    /* enable UK chars for G1 */
    {"\033)B", "ISO US G1"},    /* enable US chars for G1 */
    /* these are specified by X.364 and iBCS2 */
    {"\033c", "RIS"},           /* full reset */
    {"\0337", "SC"},            /* save cursor */
    {"\0338", "RC"},            /* restore cursor */
    {"\033[r", "RSR"},          /* not an X.364 mnemonic */
    {"\033[m", "SGR0"},         /* not an X.364 mnemonic */
    {"\033[2J", "ED2"},         /* clear page */

Is that useful?

Thanks!

cv711 commented 8 years ago

@rburns it took me a while to find it, while trying to resolve that issue myself. Here is a table of ANSI Escape codes that describes those characters:

http://ascii-table.com/ansi-escape-sequences-vt-100.php

Thanks!

earthman1 commented 5 years ago

I found this issue while searching for an explanation for this behavior, which I have also observed in a different context. I believe that I may have discovered the cause of my own issue, and so I post my discoveries here for your benefit as well, and for the benefit of anyone else who may stumble across it as I did.

sgr0 has different codes depending on which terminal you are using. If you are configured to output codes for a vt100 terminal, the output is as you expected. If your terminal is set to xterm, then the output is ^[(B^[[m. You may demonstrate this behavior in tput using the command

tput -T xterm sgr0 | hexdump

which will output 0000000 281b 1b42 6d5b, as @fracting observed above. Contrast that with the output of

tput -T vt100 sgr0 | hexdump

which outputs 0000000 5b1b 0f6d, which exactly matches the output @rburns observed. This doesn't appear to be a bug in any particular software, but, at least in my case, resulted from a misconfigured TERM environment variable.

This solved my problem, and I hope it clears it up for you as well!

earthman1 commented 5 years ago

Actually, I'm realizing this wasn't a different context after all... The extraneous (B was showing up in my CI/CD output on gitlab, which appears to be processed through ansi2html, whenever there was a tput sgr0 in the CI/CD script. My .gitlab-ci.yml file included the following code:

variables:
  TERM: xterm

Once I changed this to

variables:
  TERM: ansi

all the (B's disappeared.

rburns commented 3 years ago

I see this issue is quite old by now. Is it fair to say that the reported mishandled escape sequences are a terminal specific extension to ansi, and that this can safely be closed?

CobaltCause commented 2 years ago

So after having the exact same problem as @earthman1 (thank you for your detailed explanation and solution!), I think the best thing to do might be to explicitly document which values for TERM are supported by this library, and then closing this issue should be safe. I think this is just a documentation bug and not an implementation bug.

(Also, it's frustrating that GitLab doesn't set TERM to an appropriate value by default or at least document acceptable values somewhere, because it took me an incredibly long time to find this thread and the solution herein, but that's GitLab's problem and not yours.)

GuillaumeHM commented 1 year ago

Hello everyone,

Like you, I've searched a lot for this solution (almost 2 hours). Do you have a list of TERM values that can fit for gitlab CI ? If so which one should we use ? This one works for me:

These do not work:

So to answer myself, ansi seems the only value that works fine.

More generally, I'm trying to build a portable script to run on regular bash and Gitlab CI. I know this isn't the right place, but what code do you use ? I'm currently at this:

# The following manages colored output regarding running terminal
normal=""
bold=""
red=""
green=""
yellow=""

# Check if GITLAB_CI variable is unset so define it to avoid variable unset error (set -u)
if test -z ${GITLAB_CI+x}
then
    GITLAB_CI="false"
else
    # Change Gitlab TERM value that is 'xterm' because it leads to extra '(B' chars after sgr0
    TERM=ansi
fi

# Check if running on a regular terminal or on gitlab CI
if test -t 1 || test ${GITLAB_CI} == "true"
then
    # check if it supports colors
    ncolors=$(tput colors)
    if test -n "$ncolors" && test $ncolors -ge 8
    then
        normal="$(tput sgr0)"
        bold="$(tput bold)"
        red="$(tput setaf 1)"
        green="$(tput setaf 2)"
        yellow="$(tput setaf 3)"
    fi
fi

echo "${red}COLORS SUPPORTED${normal}"

In the hope that can help Gitlab's users...

Peneheals commented 1 year ago

I think this issue can be closed @rburns . It is clearly not a bug in ansi-to-html, but it's a misconfiguration (or missing configuration) in OP's Drone CI or any other GitLab/etc. CI.

I also double checked: if I use TERM environment variable with the value of xterm-256color in the current GitLab, CI outputs the extra (B characters, however if I set to ansi, it doesn't and everything works well.