jugglerchris / rust-html2text

Rust library to render HTML as text.
MIT License
152 stars 27 forks source link

Enhanced RichDecorator/--colour features: References and Border Colors #131

Open tkapias opened 7 months ago

tkapias commented 7 months ago

Intro

I tried many solutions to display html emails in mutt/neomutt's internal pager (elinks, readability tools, pandoc, html2text tools), but there was always some issues with the encoding, colors, references or parsing.

After a few tests today, rust-html2text seems to be the way to go, it's fast and the parsing, format, and encoding are spot on.

Request

I changed colors and styles options in the html2text example to fit my tastes, but I would like to add 3 features to the RichDecorator, used by --colour in the example:

Would you have time to implement those or give me some leads? (I'm just starting rust)

Extra

My changes to the example (only Reset for colors and style was not working correctly for me):

diff --git a/examples/html2text.rs b/examples/html2text.rs
index 2c14ddf..4ee56b6 100644
--- a/examples/html2text.rs
+++ b/examples/html2text.rs
@@ -22,41 +22,41 @@ fn default_colour_map(annotations: &[RichAnnotation], s: &str) -> String {
         match annotation {
             Default => {}
             Link(_) => {
-                start.push(format!("{}", termion::style::Underline));
-                finish.push(format!("{}", termion::style::Reset));
+                start.push(format!("{}{}", Fg(AnsiValue(153)), termion::style::Underline));
+                finish.push(format!("{}{}", Fg(White), termion::style::NoUnderline));
             }
             Image(_) => {
                 if !have_explicit_colour {
-                    start.push(format!("{}", Fg(Blue)));
-                    finish.push(format!("{}", Fg(Reset)));
+                    start.push(format!("{}{}", Fg(AnsiValue(225)), termion::style::Italic));
+                    finish.push(format!("{}{}", Fg(White), termion::style::NoItalic));
                 }
             }
             Emphasis => {
-                start.push(format!("{}", termion::style::Bold));
-                finish.push(format!("{}", termion::style::Reset));
+                start.push(format!("{}", termion::style::Italic));
+                finish.push(format!("{}", termion::style::NoItalic));
             }
             Strong => {
                 if !have_explicit_colour {
-                    start.push(format!("{}", Fg(LightYellow)));
-                    finish.push(format!("{}", Fg(Reset)));
+                    start.push(format!("{}", termion::style::Bold));
+                    finish.push(format!("{}", termion::style::NoBold));
                 }
             }
             Strikeout => {
                 if !have_explicit_colour {
-                    start.push(format!("{}", Fg(LightBlack)));
-                    finish.push(format!("{}", Fg(Reset)));
+                    start.push(format!("{}{}", Fg(AnsiValue(7)), termion::style::CrossedOut));
+                    finish.push(format!("{}{}", Fg(White), termion::style::NoCrossedOut));
                 }
             }
             Code => {
                 if !have_explicit_colour {
-                    start.push(format!("{}", Fg(Blue)));
-                    finish.push(format!("{}", Fg(Reset)));
+                    start.push(format!("{}{}", Bg(AnsiValue(25)), Fg(AnsiValue(222))));
+                    finish.push(format!("{}{}", Bg(Reset) ,Fg(White)));
                 }
             }
             Preformat(_) => {
                 if !have_explicit_colour {
-                    start.push(format!("{}", Fg(Blue)));
-                    finish.push(format!("{}", Fg(Reset)));
+                    start.push(format!("{}{}", Bg(AnsiValue(25)), Fg(AnsiValue(229))));
+                    finish.push(format!("{}{}", Bg(Reset), Fg(White)));
                 }
             }
             Colour(c) => {
tkapias commented 7 months ago

For the moment I've been able to get the equivalent result by concatenating outputs of both "Rich" and "Plain" with sed.

It's a bit sloppy, but it's the best result my neomutt has ever seen.

If html2text could do it directly that would be great.

Neomutt's mailcap:

text/html; auto-view_html %s %{charset} ${COLUMNS:-80}; nametemplate=%s.html; copiousoutput; x-neomutt-nowrap;

auto-view_html script:

#!/usr/bin/env bash
shopt -s extglob
export LC_ALL="C.UTF-8"
export TZ=:/etc/localtime

[[ $3 -lt 80 ]] && _columns=$3 || _columns=80
html2text --width $_columns --wrap-width $_columns --colour "$1"
echo
html2text "$1" | sed -E '/^\[1\]: /,$!d' | tr -d '\n' | sed -E 's/(.)(\[[0-9]*\]: )/\1\n\2/g'

image

jugglerchris commented 7 months ago
  • Links listed as references, like in the PlainDecorator.

That seems like a reasonable think to want! My first thought would be to add an option to RichDecorator (a new constructor that sets a flag) to do the references.

  • References wrap at --wrap-width,

I think it should be at the --width. But some reports have said that URLs work better if they're not manually wrapped - so maybe that should be an option (internally anyway).

to help with long links osc 8.

There is #119 , opened recently. It seems reasonable as an option to the html2text binary, and I think can be done in the default_colour_map() function.

  • Assign a color to borders and horizontal lines, to dim them.

That sounds like the hardest thing! There may be a quick-and-dirty way to colour all borders the same (which sounds like what you want), but if that's possible it seems a shame not to support the border-color CSS styles, and that needs some thought.

In case it helps, there was a change just merged (#129 ) which adds an option to not draw the borders, in case that solves your problem in the short term.

tkapias commented 7 months ago

Thank you,

I will try to write something for RichDecorator another day.

About the references width, the issue is that some pagers (like the internal one in neomutt) do not support osc 8, and they wrap urls making it impossible for most terminals to parse them (urxvt has a marker extension to select urls but parse per line). So to copy an url, without an external tool, you need to select multiple lines manually: the wrapped URLs must reach the screen's border to avoid spaces.

I will go write a comment about osc 8 in #119.

About the new options in #129, thanks, it works. I put config = config.raw_mode(true); in the example. It is better in most cases but I think that I will miss the borders sometimes, or at least some horizontal separators.