michaelrsweet / htmldoc

HTML Conversion Software
https://www.msweet.org/htmldoc
GNU General Public License v2.0
208 stars 47 forks source link

Broken links after version 1.9.3 #514

Closed prichterich closed 9 months ago

prichterich commented 1 year ago

I have a project with about 40 html files, many of them with internal anchors and links to such internal anchors. An example: <a href="Project3.htm#Create">New Project</a> In the source file, the anchor is: <h2><a name="Create"></a>Creating New Projects</h2> With version 1.9.3, these links are correctly resolved in the PDF file. But with newer versions, all links to an internal anchor are broken. When hovering over the link, is shows as: file:///Users/Shared/.../Project3.htm%23Create

I'd prefer to use the current version since it contains relevant bug fixes, but the broken links to internal anchors prevent that. This issue was introduced shortly after 1.9.3, possibly in 1.9.4. I have tried a number variations of defining the elements, but nothing worked.

michaelrsweet commented 1 year ago

Can you attach an example that demonstrates this issue? I checked and the link in the test suite and HTMLDOC Users Manual both work as expected...

prichterich commented 1 year ago

A ZIP archive with a simple example is attached. It includes the PDF file generated by 1.9.3, where the link works, and the PDF file from 1.9.17, where it is broken.

On my M2 Mac running macOS13, the GUI versions of html doc are broken. 1.9.3 pops up an empty window, 1.9.17 does nothing. I don’t care since I use scripts, anyway, but thought I’d mention it.

On Oct 5, 2023, at 11:13 AM, Michael R Sweet @.***> wrote:

Can you attach an example that demonstrates this issue? I checked and the link in the test suite and HTMLDOC Users Manual both work as expected...

— Reply to this email directly, view it on GitHub https://github.com/michaelrsweet/htmldoc/issues/514#issuecomment-1749108164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5YE5Z2NH5GPX27E5RCHN3X53FCDAVCNFSM6AAAAAA5TEV2DWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBZGEYDQMJWGQ. You are receiving this because you authored the thread.

michaelrsweet commented 1 year ago

@prichterich The ZIP file got stripped...

prichterich commented 1 year ago

Thought it would get through the email notification. Attaching it here instead: htmldoc_link_bug.zip

jmdoudoux commented 10 months ago

I've been using the HTMLDOC for decades to generate a PDF file currently of over 4200 pages from 135 HTML files. I'd like to take this opportunity to thank you very much for this wonderful tool.

For a long time, I have been using version 1.8.24 because of issue #505. Since its correction, I've been using version 1.9.17.

And I also have the issue of broken links describe above.

I was able to apply a workaround because I have a tool that pre-processes the HTML files before they are processed by HTMLDOC.

For this particular problem, I had to replace the href="file.htm#anchor" (which works very well in 1.8.24) by simply "#anchor" (so that it works now in 1.9.17).

Obviously this workaround works because all my anchors are unique in all HTML source files.

I hope this helps for the investigations.

prichterich commented 10 months ago

Thanks for describing the workaround. Unfortunately, our links are not unique, so this would require a substantial amount of editing. It would also complicate the build process by adding a pre-processing step, since we also need the help to work with links that include the file names.

michaelrsweet commented 9 months ago

Fixed here:

[master 4620dee] Fix file-based links in PDF output (Issue #514)

The fix adds a link of the form "filename#anchor" in addition to adding or updating the shortened "#anchor" link. The code will still have issues with using "#anchor" links when the anchor string is not unique in the collection of HTML files/URLs but I'm not sure I have enough context in HTMLDOC to handle that at the point the link is rendered...