theZiz / aha

Ansi HTML Adapter
Other
915 stars 88 forks source link

Feature request: Process CR characters to match final terminal output #71

Closed iskunk closed 4 years ago

iskunk commented 4 years ago

I am using the script(1) command to record terminal sessions. aha handles the output files well, but for one issue: programs which use \r repeatedly to update the current output line. This results in numerous repeated lines in the aha output, along with the raw CR characters.

Here is a sample file produced by script(1), in which I invoke apt-get(8). If you view this file in less(1), the CR characters and their associated text are clearly visible.

Here is a straight copy-and-paste from the terminal after the above command was run. The desired processing would produce text of this form (aside from the color, of course).

Being able to process script(1)-generated files cleanly would be a useful feature, as these often capture ANSI escape sequences, and removing the CR characters without munging up the colors is nontrivial with standard Unix tools.

theZiz commented 4 years ago

Thanks for the report and the easy to reproduce test files. I just added an option to ignore carriage-returns. It should not interfere with the color detection. :)

iskunk commented 4 years ago

Thanks for adding this :-)

I gave it a try with a larger input I have, and unfortunately the behavior is not quite right yet. Here are files demonstrating the issue:

If it helps, viewing the input file with less -r should render it correctly to the terminal, for comparison.

theZiz commented 4 years ago

Unfortunately this is a different problem aha can't solve because of the way it is working. aha can't change the past. It gets a sign and converts it to HTML. It's not even working linewise but really character by character. Your 'apt-getexample is using carriage-return to restart at the first sign of the already printed line and redraw it.ahacan't handle this, so you would need some kind of puffer application in between or use something likesedto remove every "line" which is replaced by\r` anyway:

cat sample-script.txt | sed 's/\r$//g' | sed 's/^.*\r\(.*\).*$/\1/' | aha -b > sample-script.htm

So this command takes your output (cat sample-script.txt), then removes all \r and end of lines (sed 's/\r$//g'), then replace all occurrences of somthing\relse with else as r is reseting the cursor to the first sign again (sed 's/^.*\r\(.*\).*$/\1/') and then uses aha -b with the result which has all \r removed already.

Only problem which may occur, if, it \r appears inside an ANSI control code as stated here: https://github.com/theZiz/aha/issues/55 Let's just hope that this will not be the case for you. :)

iskunk commented 4 years ago

I would tweak those sed commands into

sed 's/\r\r*$//g; s/^.*\r\(.*\).*$/\1/'

to handle multiple CRs at the end of a line, and combine them into one sed invocation. I tested this with my larger input file, and it does appear to give the correct result.

But yes, any ANSI sequences that are "erased" by the CRs will not have an effect on the HTML output, even though they do affect the terminal. My input does not appear to have this situation, but it could certainly occur.

I have two suggestions:

  1. For now, create a wrapper shell script (maybe call it aha-term?) that has the same interface as aha, but filters the input through the above sed command, as well as the one in #55 (i.e. any necessary external preprocessing is added to this script). This wrapper script would be the recommended solution for users who want to process script(1) output files with aha.

  2. For future development of aha, consider adding a line buffer to the way it processes input. This is ultimately what would be needed to handle terminal-control characters 100% correctly.

theZiz commented 4 years ago

I guess I will go for variant 2 in the future. Although it will not handle it 100% correctly as as far I can remember also redrawing the whoole screen is possible with vt100 (e.g. ncurses, top, htop). However most stuff should work then, especially carriage-return as well as backspaces. :)

But I will close this issue for now ;)

iskunk commented 4 years ago

Okay. I look forward to this future version of aha, then! Please feel free to mention me once you have an implementation, and I will be happy to test it.

Incidentally, on the full-screen issue... I think the most reasonable thing that you can do there is to simply ignore the terminal's "alternate screen" altogether. I don't see how else you can handle it, unless you convert the input to an animated GIF, or some kind of wacky JavaScript-animated HTML.

For example, in Debian, dpkg-reconfigure is a command that normally brings up a full-screen ncurses UI. But if I run that command, and it exits, then this is what I see in the terminal:

root@test-debian64:~# dpkg-reconfigure debconf
root@test-debian64:~# 

All the full-screen stuff is present only on the alternate screen; the terminal's normal buffer shows none of it. If I captured that whole session with script(1), then all the ncurses I/O would be captured in the file. But what I would hope, and expect aha to do, is to ignore it completely, and give output that looks like the above.

I believe there are specific terminal commands to switch to the alternate screen, and back to the regular one, so perhaps it might not be too hard to make aha recognize them and drop everything in between?

iskunk commented 2 years ago

I've noticed wrong colors in the output using the above sed command, and would like to provide a better alternative via Perl:

# Preprocess the input with this command before feeding it to aha
#
perl -pe 's/\r+$//g; if(/^(.*)\r([^\r]+)$/){ $a=$1;$b=$2;$c=""; while($a=~/(\x1B\[\d+(;\d+)*m)/g){$c.=$1} $_=$c.$b }'

This will let through color escapes that precede a \r in the middle of a line, so that they still affect the output.