zhaoterryy / mkdocs-pdf-export-plugin

An MkDocs plugin to export content pages as PDF files
MIT License
313 stars 44 forks source link

Support relative PDF links #32

Closed zhaoterryy closed 5 years ago

zhaoterryy commented 5 years ago

Currently, links such as [Topic](topic.md) become absolute links to their appropriate HTML files such as file:///a/b/site/topic.html.

This implements a preprocessor to overwrite all the hrefs accordingly.

No support for combined PDF yet - this would require perhaps a generated id to replace all the anchor and href IDs of the single PDF document due to collision between naming of anchor IDs in multiple .md files.

Fixes #28.

estan commented 5 years ago

Great, but I don't quite understand, isn't it precisely in combined output that you would want a link like [Topic](topic.md) to work properly (that is, a link to another page)? For the one-pdf-per-page default mode, it's links like [Topic](topic.md#foo) where it's important that PDF links stay within the PDF?

estan commented 5 years ago

I've confirmed this works for [this type](#of-link) within a document.

To get my pictures to work I had to add

    for img in soup.find_all('img'):
        try:
            img['src'] = get_abs_asset_href(base_url, img['src'])
        except KeyError:
            pass

to replace_hrefs.

It would be great if combined mode worked as well, as it's our main use case.

Could something like this work?

  1. Deducing from a link whether it points to a page within the project (it points to #foo or foo.html or foo.html#bar where foo.md is a page in the project).
  2. If so, rewrite it following these rules:
    • #foo (link is on page page.html) -> page-html-foo
    • foo.html -> #foo-html
    • foo.html#bar -> #foo-html-bar
  3. Patch up the ids of h1, h2, ... to match.

There's still a risk of conflict, but I think the risk would be rather low?

zhaoterryy commented 5 years ago

It would be great if combined mode worked as well, as it's our main use case.

Yes, my goal is to have that fixed for 0.5.1.

To get my pictures to work I had to add

    for img in soup.find_all('img'):
        try:
            img['src'] = get_abs_asset_href(base_url, img['src'])
        except KeyError:
            pass

to replace_hrefs.

There are more tags that I missed, rather than querying through tag name, we should query through attributes. Something like soup.find_all(src=True)

Could something like this work? ... There's still a risk of conflict, but I think the risk would be rather low?

I was thinking something along the same line, something like:

#<path>:<id> foo/bar/devvy/devdev.html#tester would become #foo/bar/devvy/devdev:tester

Would have to "ban"/throw error for the / and : characters from the input IDs though.

zhaoterryy commented 5 years ago

v0.5.1b2 is up - combined output should be supported now.

estan commented 5 years ago

Great will have a go when at work. The commit message sounds delicious :)

estan commented 5 years ago

@zhaoterryy Hm, I installed 0.5.1b2:

(mkdocsenv) [estan@newton manual (user-manual $%)]$ pip list --local | grep pdf-export
mkdocs-pdf-export-plugin 0.5.1b2
(mkdocsenv) [estan@newton manual (user-manual $%)]$

But an internal link still leads to the HTML:

broken_link

In this case, I tried clicking the "Working with Projects" link, which is [Working with Projects](projects.md) in the Markdown.

zhaoterryy commented 5 years ago

Please try v0.5.1b3 👍

estan commented 5 years ago

@zhaoterryy Bingo. Works great. Much appreciated!

estan commented 5 years ago

Tested: [links](like_this.md), [links](like.md#this) and external links [like](https://google.com).