snosov1 / toc-org

toc-org is an Emacs utility to have an up-to-date table of contents in the org files without exporting (useful primarily for readme files on GitHub)
GNU General Public License v3.0
292 stars 29 forks source link

Meaningful & stable anchors in HTML export #29

Closed VladimirAlexiev closed 7 years ago

VladimirAlexiev commented 8 years ago

Consider http://vladimiralexiev.github.io/Multisensor/validation.html. The TOC section is made by toc-org. But the links don't work (and are shown in italic: that's how the HTML export marks broken links)

The reason is that the HTML exporter (org-plus-contrib-20150803 ox-html.el) uses numbered anchors, eg http://vladimiralexiev.github.io/Multisensor/validation.html#sec-3-1. The following are tried in succession to obtain a preferred-id:

              (list (org-element-property :CUSTOM_ID headline)
                (concat "sec-" section-number)
                (org-element-property :ID headline))))

The first is manually set CUSTOM_ID property, the second a numbered reference (eg "sec-3-1") and the third an ID that's automatically inserted by org-store-link.

Numbered anchors are bad because they are not stable: if I move the section, the anchor will change. Github makes anchors from the heading text, which in my experience is a lot more stable. toc-org generates such links when toc-org-hrefify-default is "gh".

The newest ox-html.el uses this

                      (list (org-element-property :CUSTOM_ID headline)
                            (org-export-get-reference headline info)
                            (org-element-property :ID headline))))

You see the second line is changed. It uses org-export-get-reference from ox.el, which uses org-export-new-reference, which "Generates random 7 digits hexadecimal numbers". I don't know if that is stored (which would make it stable), but it's certainly not meaningful to any reader of the HTML file.

I'll try to raise an issue to the ox-export developers. They have a mailing list, no tracker, that's why I'm posting this here.

snosov1 commented 8 years ago

As I said, I don't really have the motivation to work on this myself. But it looks like you can do this yourself with a little effort. It looks like, the only thing you need to do is to implement a function, like toc-org-hrefify-orghtml, similar to toc-org-hrefify-gh that converts text to "href ids". I.e. for github it downcases everything, replaces spaces with -, etc.

VladimirAlexiev commented 8 years ago

But it'll work only in Github, not in org nor HTML? Posted a message to 'emacs-orgmode@gnu.org' that largely repeats the above, and adds:

So my request is: the HTML export should have an option org-export-anchors-use-title to generate section anchors from the section title.

Github keeps the section number, and doesn't strip the TODO and statistics-cookies. So a section headline like 3.2.2 TODO Normalization Problems [3/4] will get this anchor #322-todo-normalization-problems-34.

So it's best to include further options about which parts of a headline to use:

snosov1 commented 8 years ago

But it'll work only in Github, not in org nor HTML?

You can provide your own function that converts headings to links. In case of GitHub, it downcases everything, replaces spaces, etc. For HTML you can generate hrefs, like #sec3-1, etc. There's more information at https://github.com/snosov1/toc-org#different-href-styles

On the second thought, you can only make 2 work at once. Either Github-Org or HTML-Org. However, you can, probably, workaround this by setting toc-org-hrefify-default in a pre-export hook. This is the place where it starts to get hairy, but it seems like it can work =) To repeat myself, personally, I would rather use native Org utilities for HTML export

snosov1 commented 7 years ago

I'm doing a little cleanup and I'm inclined to close this issue, since it's really out of scope of the package (but something you can do on your side, if you wish). Please, let me know, if you need any support from my side.