michal-h21 / make4ht

Build system for tex4ht
131 stars 15 forks source link

Remove anchor tag in xml #144

Closed hcf-n closed 2 months ago

hcf-n commented 5 months ago

I use make4ht to convert to docbook. In the resulting xml there are several anchor tags. Is there a way to remove/prevent these tags? (Some anchor tags are self-closing, and some are open-close pairs)

best regards hcf

michal-h21 commented 5 months ago

These anchors are usually created by \label or at places, where labels could be used, like in sections or tables. I think the easiest way how to remove them is to create make4ht DOM filter that would remove links that don't link anywhere.

Can you create a MWE that shows these extra anchors?

hcf-n commented 5 months ago

Of course :)

The following minimal example gives examples of different anchor tags when converting to docbook xml.

\documentclass[11pt, a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc,url}
\usepackage{textcomp}
\usepackage[style=authoryear-comp]{biblatex}
\addbibresource{\jobname.bib}
\usepackage{filecontents}

\begin{filecontents}{\jobname.bib}
@book{key,
author = {Author, A.},
year = {2001},
title = {Title},
publisher = {Publisher},
}
\end{filecontents}

\title{Placeholder for title}
\author{Firstname Lastname}
\date{\today}

\begin{document}
\maketitle
\section{Test}
\begin{enumerate}
  \item First item
  \item Second item
\end{enumerate}

Sentence.\footnote{Example footnote.}

Sentence.\footcite{key}

\end{document}
michal-h21 commented 5 months ago

This make4ht build file should do it:

local domfilter = require "make4ht-domfilter"

local process = domfilter {
  function(dom)
    local links = {}
    for _, el in ipairs(dom:query_selector("link")) do
      -- collect all links
      links[el:get_attribute("xlink:href"):gsub("^#", "")] = true
    end
    for _, el in ipairs(dom:query_selector("anchor")) do
      if not links[el:get_attribute("xml:id")] then
        el:remove_node()
      end
    end

    return dom
  end
}

Make:match("xml$", process)

It first saves all links, in order to keep anchors that some links point to. Then it loops over anchors and remove ones that no link points to.

Unfortunately, I've found a bug in make4ht, so it is possible that it will fail for you, if you use any links in your document (for example using \ref command). It will print that XML parsing failed. This should be fixed in the development version of make4ht.

hcf-n commented 5 months ago

Thank you, this works fine!

michal-h21 commented 5 months ago

Great! So should I close this issue?