pycontribs / ansi2html

Convert text with ansi color codes to HTML
GNU Lesser General Public License v3.0
386 stars 78 forks source link

Stray characters in html output #102

Open mikepurvis opened 4 years ago

mikepurvis commented 4 years ago

Not totally sure what's going on here, but this is diagnostic output from GCC 9:

^[[01m^[[K/path/to/my/file.cpp:127:30:^[[m^[[K ^[[01;35m^[[Kwarning: ^[[m^[[Kcomparison of integer expressions of different signedness: ~@~X^[[01m^[[Kunsgned int^[[m^[[K~@~Y and ~@~X^[[01m^[[Kint^[[m^[[K~@~Y [^[[01;35m^[[K-Wsign-compare^[[m^[[K]
  127 |   for (unsigned int y = 0; ^[[01;35m^[[Ky < height^[[m^[[K; y++)
      |                            ^[[01;35m^[[K~~^~~~~~~~^[[m^[[K

Which is rendered by ansi2html into:

<span class="ansi1">/path/to/my/file.cpp:127:30:</span> <span class="ansi1 ansi35">warning: </span>comparison of integer expressions of different signedness: ‘<span class="ansi1">unsigned int</span>’ and ‘<span class="ansi1">int</span>’ [<span class="ansi1 ansi35">-Wsign-compare</span>]
  127 |   for (unsigned int y = 0; <span class="ansi1 ansi35">y &lt; height</span>; y++)
      |                            <span class="ansi1 ansi35">~~^~~~~~~~</span>

Anyone know what those extra ‘ sequences are, and if they can be filtered out somehow?

hartwork commented 2 years ago

When trying to reproduce it…

# echo 'int main(int argc, char ** argv) { return argc < (unsigned)argc; }' > main.c
# gcc -Wextra -fdiagnostics-color=always main.c |& ansi2html > gcc.htm
# gcc -dumpversion
11.2.1
# ansi2html --version
ansi2html 1.7.1.dev1  # i.e. Git master

…what I see in the browser is this: gcc

Which looks sane. So I'll need help with reproducing.

hartwork commented 2 years ago

PS: Here's what I get for your very example pasted into input.txt. Note the sed call to repair the ANSI on the fly:

# sed $'s,\^\[,\x1b,g' input.txt | ansi2html > gcc.htm

Then in Chromium: gcc2