less anyfile.pdf gives error

workflowsguy commented 2 years ago

When running less anyfile.pdf, the following error is displayed:

==> append : to filename to view the PDF source
usage: html2text [-h] [--default-image-alt DEFAULT_IMAGE_ALT] [--pad-tables]
                 [--no-wrap-links] [--wrap-list-items] [--ignore-emphasis]
                 [--reference-links] [--ignore-links] [--protect-links]
                 [--ignore-images] [--images-as-html] [--images-to-alt]
                 [--images-with-size] [-g] [-d] [-e] [-b BODY_WIDTH]
                 [-i LIST_INDENT] [-s] [--escape-all] [--bypass-tables]
                 [--ignore-tables] [--single-line-break] [--unicode-snob]
                 [--no-automatic-links] [--no-skip-internal-links]
                 [--links-after-para] [--mark-code]
                 [--decode-errors DECODE_ERRORS] [--open-quote OPEN_QUOTE]
                 [--close-quote CLOSE_QUOTE] [--version]
                 [filename] [encoding]
html2text: error: unrecognized arguments: -from_encoding
anyfile.pdf (END)

(looks similar to #60)

I have pdftotext installed on my system and was hoping by running the above command to get output comparable to pdftotext --layout anyfile.pdf -. But I can not figure out how to configure lesspipe.sh to use pdftotext instead of html2text for pdf files.

I am using lesspipe.sh 1.91 installed via MacPorts in zsh.

wofr06 commented 2 years ago

A workaround is to modify lesspipe.sh and replace the first occurrence of html2text in parsehtml by broken_html2text

workflowsguy commented 2 years ago

Unfortunately, this does not work.

The first line with html2text in the parsehtml function is

elif cmd_exist html2text; then

If I change this to

elif cmd_exist broken_html2text; then

less anyfile.pdf

shows no file contents:

==> append : to filename to view the PDF source
anyfile.pdf (END)

wofr06 commented 2 years ago

the latest lesspipe.sh first tries pdftotext for converting pdf files. If that is not found, then using pdftohtml it is converted to html. The html file is converted to text by first trying w3m, then lynx, then elinks and if all of it is not installed then html2text without flags for different encodings, as several incompatible html2text implementations do exist. The test suite passes the interpretation of pdf files.

wofr06 / lesspipe

less anyfile.pdf gives error #68