michaelrsweet / htmldoc

HTML Conversion Software
https://www.msweet.org/htmldoc
GNU General Public License v2.0
206 stars 46 forks source link

Not found - HTML error 404 downloading image #507

Closed step- closed 11 months ago

step- commented 1 year ago

Hello,

While trying to convert a webpage I found a download issue involving <img> tags. Am I doing something wrong? Debug output suggests that htmldoc makes the two 404 image download URLs one-up with respect to the document's base URL therefore it can't find them. However, I can download the images with wget at the base URL, e.g., wget http://distro.ibiblio.org/fatdog/web/fatdog.png.

Perhaps related #499, #486.

Thank you for htmldoc!

mkdir /tmp/test

HTMLDOC_DEBUG=all htmldoc -t pdf13 -f "/tmp/test/fatdog.pdf" --webpage --no-title --linkstyle underline --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --header1 ... --footer h.i --nup 1 --tocheader .t. --tocfooter ..I --portrait --color --no-pscommands --no-xrxcomments --compression=1 --jpeg=0 --fontsize 11.0 --fontspacing 1.2 --headingfont Helvetica --bodyfont Sans --headfootsize 11.0 --headfootfont Helvetica --charset utf-8 --links --embedfonts --pagemode document --pagelayout single --firstpage p1 --pageeffect none --pageduration 10 --effectduration 1.0 --no-encryption --permissions all  --owner-password ""  --user-password "" --browserwidth 680 --path "/tmp/test" --no-strict --overflow http://distro.ibiblio.org/fatdog/web/index.html

ERR404: Not Found (http://distro.ibiblio.org:80/fatdog/fatdog.png)
ERR404: Not Found (http://distro.ibiblio.org:80/fatdog/screen.jpg)
DEBUG: Updating links in document.
DEBUG: Mapping "releases.xml" to "http://distro.ibiblio.org:80/fatdog/web/releases.xml"...
DEBUG: Mapping "latest.html" to "http://distro.ibiblio.org:80/fatdog/web/latest.html"...
DEBUG: Mapping "../iso/" to "http://distro.ibiblio.org:80/fatdog/web/../iso/"...
DEBUG: Mapping "../iso/pre-release" to "http://distro.ibiblio.org:80/fatdog/web/../iso/pre-release"...
DEBUG: Mapping "arm-index.html" to "http://distro.ibiblio.org:80/fatdog/web/arm-index.html"...
DEBUG: Mapping "../packages/" to "http://distro.ibiblio.org:80/fatdog/web/../packages/"...
DEBUG: Mapping "../sfs/" to "http://distro.ibiblio.org:80/fatdog/web/../sfs/"...
DEBUG: Mapping "../source/" to "http://distro.ibiblio.org:80/fatdog/web/../source/"...
DEBUG: Mapping "faqs/faq.html" to "http://distro.ibiblio.org:80/fatdog/web/faqs/faq.html"...
DEBUG: Mapping "file:///usr/share/doc/faqs/faq.html" to "http://distro.ibiblio.org:80/fatdog/web/file:///usr/share/doc/faqs/faq.html"...
DEBUG: Mapping "history.html" to "http://distro.ibiblio.org:80/fatdog/web/history.html"...
DEBUG: Document Tree = 142 kbytes
DEBUG: Table of Contents Tree = 0 kbytes
DEBUG: Render Data = 44 kbytes
PAGES: 4
BYTES: 388855
TIMING: 3.055 0.059 3.115
REMOTEBYTES: 11235
DEBUG: Temporary File Summary
DEBUG:
DEBUG: URL                             Filename
DEBUG: ------------------------------- ---------------------
DEBUG: http://distro.ibiblio.org/fatdo /tmp/006638.000001.tmp

DEBUG:
michaelrsweet commented 1 year ago

Will see what is going wrong here - base URL should be "http://distro.ibiblio.org/fatdog/web" but it isn't using that for the images.

michaelrsweet commented 11 months ago

[master 296281a] Fix relative URL rewriting bug (Issue #507)