html-to-text / node-html-to-text

Advanced html to text converter
Other
1.56k stars 224 forks source link

Convert a table into a markdown compatible table #314

Open ElectricCodeGuy opened 1 month ago

ElectricCodeGuy commented 1 month ago

The goal

I am trying to convert an HTML table structure into a markdown table format using the html-to-text package in JavaScript/TypeScript.

Best attempt

So far, I have tried configuring the htmlToTextOptions object with various selectors and formatters to handle table elements and their contents. I have used selectors like table, tr, th, and td to target specific table elements and applied formatters like tableHeaderCell and tableDataCell to format the cell contents.

I have also attempted to use a custom block formatter to handle the table structure and add the necessary markdown syntax, such as the separator row (|---|---|), but it seems that the block formatter is not supported by the HtmlToTextOptions interface, causing TypeScript errors.

The question

I can't figure out how to properly convert the HTML table structure into a valid markdown table format using the html-to-text package. I need assistance in determining the correct configuration and formatters to achieve the desired output.

Prior research

I have searched for documentation and examples related to the html-to-text package, but I couldn't find a clear solution for converting HTML tables to markdown tables. I have also tried various combinations of selectors and formatters based on my understanding of the package, but I haven't been able to achieve the desired result.

I have considered exploring alternative packages or libraries that might have built-in support for converting HTML tables to markdown, but I would prefer to find a solution using the html-to-text package if possible.

If anyone has experience with the html-to-text package and can provide guidance on how to properly configure it to convert HTML tables to markdown tables, I would greatly appreciate any insights or code examples. Additionally, if there are any other approaches or techniques I should consider to achieve this goal, I am open to suggestions.

Thank you in advance for your help!

KillyMXI commented 1 month ago

I have html-to-markdown converter in the works. Unfortunately, it's not production-ready yet, and only good on well-formed input. A lot of unfortunate things aligned, preventing me from investing more time into it currently. I intend to complete it, but I don't have ETA. With arbitrary text output we can freehand certain things and avoid some corner cases. Markdown requires more attention.

You can look at my implementation of markdown table formatter there:

https://github.com/html-to-text/node-html-to-text/blob/5b7ca1c1a736a730c9a4fa1b6db6172e50f4ee3e/packages/html-to-md/src/md-formatters.js#L338

Note that it also uses its own table printer.

ElectricCodeGuy commented 1 month ago

Thanks for quick response! I have been using this and it seems to output the table structured as a table if i check the output. But if i apply a markdown renderer like React Markdown it does not render it as a table. It just render it like normal lines with a \n at the end of each row in the table

KillyMXI commented 1 month ago

I don't understand what you mean.

html-to-text has its own dataTable formatter that is not compatible with markdown.

html-to-markdown will be a separate package with its own set of formatters. They may have similar names, so pay attention which package they are in.

Formatters API is the same, but I don't ship markdown formatters in any published package yet. The code in the repository is work in progress.

You can bring any existing formatter as a custom formatter by copying the code, but data tables have most complex formatters and require more effort and understanding to do so. (And pay closer attention to the license as well, when copying code instead of importing. MIT still requires attribution.)


In case you're actually using html-to-md from the repository and having issues with produced markdown - that's a separate story. I've no idea what markdown flavor React Markdown supports, whether I already support it through configuration, or whether there are other issues. That's why I'm not publishing it in current state - I don't want to deal with a lot of problems that people will report before I implement a solid solution for them.