Closed weyert closed 3 months ago
Thanks for the feature request!
Added some additional facilities to support overriding as well (same area..), resulting with these new options:
- `overrideElementProcessing?: (element: Element, options: ConversionOptions, indentLevel: number) => SemanticMarkdownAST[] | undefined`: Custom processing for HTML elements.
- `processUnhandledElement?: (element: Element, options: ConversionOptions, indentLevel: number) => SemanticMarkdownAST[] | undefined`: Handler for unknown HTML elements.
- `overrideNodeRenderer?: (node: SemanticMarkdownAST, options: ConversionOptions, indentLevel: number) => string | undefined`: Custom renderer for AST nodes.
- `renderCustomNode?: (node: CustomNode, options: ConversionOptions, indentLevel: number) => string | undefined`: Renderer for custom AST nodes.
Let me know if this is what you had in mind..
Thank you that looks great! I will give a try today.
Maybe these four new options might also help with resolving a weird issue I noticed last night were when you have some content like:
<ul>
<li>
<p>Company - Prefix <span class="APIparametervalue">003-</span> i.e. <span class="APIparameterkey">CompanyId</span></p>
</li>
</ul>
If I run the following:
const pageContent = `<ul>
<li>
<p>Company - Prefix <span class="APIparametervalue">003-</span> i.e. <span class="APIparameterkey">CompanyId</span></p>
</li>
</ul>`;
const markdownResults = convertHtmlToMarkdown(pageContent, {
extractMainContent: true,
filePath,
debug: true,
overrideDOMParser: new new JSDOM().window.DOMParser(),
});
It generates:
[
{
"type": "list",
"ordered": false,
"items": [
{
"type": "listItem",
"content": [
{
"type": "text",
"content": "Company - Prefix"
},
{
"type": "text",
"content": "i.e."
},
{
"type": "text",
"content": "\n\n"
}
]
}
]
}
]
reflects in the Markdown as:
- Company - Prefix i.e.
I think adding this bit of code in htmlToMarkdownAst
would solve it but I think I should try if I can solve it with these new additions first.
} else if (elem.tagName.toLowerCase() === 'span') {
debugLog("Span");
result.push(...htmlToMarkdownAST(elem, options));
// Add a new line after the paragraph
result.push({type: 'text', content: '\n\n'});
Appreciate the feedback! :)
I tried to run the example you provided and it generated this:
- Company - Prefix 003- i.e. CompanyId
Looking at the HTML I can't say what exactly I would expect by translating this to Markdown.. thoughts?
Yeah, I would expect something like Company - Prefix 003- i.e. CompanyId
so what you are getting looks correct.
Hmm, wondering why it is not working for me. I will upgrade to the latest version. I would like to say your library is really cool. Makes it easier to convert our existing documentation site into Markdown.
That's odd, I can't think of any recent code change that might "fix" this.. but.. this was generated from the current code so 🤷
Thanks for sharing! I'm happy I open sourced it! :)
I would like to suggest an improvement for both
markdownASTToString
andhtmlToMarkdownAST
-functions to call a hook or callback for a unrecognised element tag name or element type.This would allow to cover the case where some pages that heavily rely on XML namespace for custom data by implementing hooks to handle these elements.
For example: