tats / w3m

Debian's w3m: WWW browsable pager
https://tracker.debian.org/pkg/w3m
Other
845 stars 91 forks source link

w3mman2html.cgi does not convert https to links #292

Open cpaelzer opened 7 months ago

cpaelzer commented 7 months ago

Hi, via a bug report on man page visualization I've realized that w3mman2html.cgi does not convert https to proper href links.

Reproducing the issue

$ cat > test << EOF
.TH TEST "1"
> .SH "Test"
> Test http URL: <http://www.gnu.org>
> .br
> Test https URL: <https://www.gnu.org>
> EOF

$ /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=/root/test"
Content-Type: text/html

<html>
<head><title>man </title></head>
<body>
<pre>
<u>TEST</u>(1)                                                                                     General Commands Manual                                                                                     <u>TEST</u>(1)

<b>Test</b>
       Test http URL: &lt;<a href="http://www.gnu.org">http://www.gnu.org</a>&gt;
       Test https URL: &lt;https://www.gnu.org&gt;

                                                                                                                                                                                                        <u><a href="file:///usr/lib/w3m/cgi-bin/w3mman2html.cgi?TEST(1)">TEST</a></u>(1)

You can see that the http link was converted to a proper link, while the https link was not changed. I do not know if there is more to it as it seems to trivial and I feel I overlook something, but isn't that just this line:

s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;

In my test I found this to work well

diff -Naur /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig /usr/lib/w3m/cgi-bin/w3mman2html.cgi.new 
--- /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig   2024-01-30 08:08:50.278360949 +0000
+++ /usr/lib/w3m/cgi-bin/w3mman2html.cgi.new    2024-01-30 08:15:19.521156596 +0000
@@ -162,7 +162,7 @@
     next;
   }

-  s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
+  s@(https|http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
   s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a href="mailto:$2">$1$2</a>@g;
   s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . &file_ref($2)@ge;
   s@(include(<\/?[bu]\>|\s)*\&lt;)([\w.\-/]+)@$1 . &include_ref($3)@ge;

I'll file this trivial change as a PR, but I can't get rid of the feeling that I'll be told why we can't make that change :-)

bptato commented 7 months ago

I'll file this trivial change as a PR, but I can't get rid of the feeling that I'll be told why we can't make that change :-)

I've been using a very similar fix locally, and don't see why your patch wouldn't work :)

FWIW, another way I tried to fix this was to run MARK_URL through a W3m-Control header, but it seems there is no way to run the command after the page has been loaded :( So I guess the best solution is to adjust the regex as you did.

rkta commented 7 months ago

On Tue, Jan 30, 2024 at 12:31:08AM -0800, Christian Ehrhardt wrote:

diff -Naur /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig /usr/lib/w3m/cgi-bin/w3mman2html.cgi.new 
--- /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig 2024-01-30 08:08:50.278360949 +0000
+++ /usr/lib/w3m/cgi-bin/w3mman2html.cgi.new  2024-01-30 08:15:19.521156596 +0000
@@ -162,7 +162,7 @@
     next;
   }

-  s@(http|ftp)://[\w.\-/~]+[\w/]@<a ***@***.***;
+  s@(https|http|ftp)://[\w.\-/~]+[\w/]@<a ***@***.***;
   s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a ***@***.***;
   s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . ***@***.***;
   s@(include(<\/?[bu]\>|\s)*\&lt;)([\w.\-/]+)@$1 . ***@***.***;

As this is Perl, using (https?|ftp) should work.