vsch / flexmark-java

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
BSD 2-Clause "Simplified" License
2.21k stars 260 forks source link

FlexmarkHtmlConverter - HtmlConverterExtension #613

Open c4rth opened 1 month ago

c4rth commented 1 month ago

Is your feature request related to a problem? Please describe.

I'm using flexmark-html2md-converter to convert Confluence HTML pages to markdown.

The elements in the html are already handled by the default conversion. e.g. <div>, <img>, <span>, ... But for some cases, the class attributes describe more precisely what the elements are. e.g.

So I wrote an HtmlNodeRenderer (inner class of an HtmlConverterExtension) for <div> but it handles all of them, I didn't find a way to specialize it.

Describe the solution you'd like

It would be nice to be more specific in the HtmlNodeRendererHandler to handle tagName and attributes (class or others). Like that I can have one extension by type (tag + attribute(s)) and not one by tag.

actual:

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", Element.class, this::processDiv)
        ));
    }
    ...
}

desired:

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        // <div class='className1 className2 ...' title='title' ... >
        Map attributesMap = Map.of("class", List.of("className1", "className2"), "title", List.of("title"));
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
        ));
    }
    ...
}

Ideally, the attributesMap should accept basic expressions (and, or, not,...)

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        // <div class='className1 or className2 ...' title='!title' ... >
        Map attributesMap = Map.of("class", or("className1", "className2"), "title", not("title"));
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
        ));
    }
    ...
}

Describe alternatives you've considered

Write one HtmlNodeRenderer by tag

Additional context none