CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Is your feature request related to a problem? Please describe.
I'm using flexmark-html2md-converter to convert Confluence HTML pages to markdown.
The elements in the html are already handled by the default conversion. e.g. <div>, <img>, <span>, ...
But for some cases, the class attributes describe more precisely what the elements are.
e.g.
<div class='confluence-information-macro'> is an admonition
<img class='emoticon'> is an emoji
So I wrote an HtmlNodeRenderer (inner class of an HtmlConverterExtension) for <div> but it handles all of them, I didn't find a way to specialize it.
Describe the solution you'd like
It would be nice to be more specific in the HtmlNodeRendererHandler to handle tagName and attributes (class or others).
Like that I can have one extension by type (tag + attribute(s)) and not one by tag.
actual:
public class MyHtmlNodeRender implements HtmlNodeRenderer {
...
@Override
public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
return new HashSet<>(Collections.singletonList(
new HtmlNodeRendererHandler<>("div", Element.class, this::processDiv)
));
}
...
}
desired:
public class MyHtmlNodeRender implements HtmlNodeRenderer {
...
@Override
public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
// <div class='className1 className2 ...' title='title' ... >
Map attributesMap = Map.of("class", List.of("className1", "className2"), "title", List.of("title"));
return new HashSet<>(Collections.singletonList(
new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
));
}
...
}
Ideally, the attributesMap should accept basic expressions (and, or, not,...)
public class MyHtmlNodeRender implements HtmlNodeRenderer {
...
@Override
public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
// <div class='className1 or className2 ...' title='!title' ... >
Map attributesMap = Map.of("class", or("className1", "className2"), "title", not("title"));
return new HashSet<>(Collections.singletonList(
new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
));
}
...
}
Is your feature request related to a problem? Please describe.
I'm using flexmark-html2md-converter to convert Confluence HTML pages to markdown.
The elements in the html are already handled by the default conversion. e.g.
<div>
,<img>
,<span>
, ... But for some cases, the class attributes describe more precisely what the elements are. e.g.<div class='confluence-information-macro'>
is an admonition<img class='emoticon'>
is an emojiSo I wrote an HtmlNodeRenderer (inner class of an HtmlConverterExtension) for
<div>
but it handles all of them, I didn't find a way to specialize it.Describe the solution you'd like
It would be nice to be more specific in the
HtmlNodeRendererHandler
to handle tagName and attributes (class or others). Like that I can have one extension by type (tag + attribute(s)) and not one by tag.actual:
desired:
Ideally, the attributesMap should accept basic expressions (and, or, not,...)
Describe alternatives you've considered
Write one HtmlNodeRenderer by tag
Additional context none