vsch / flexmark-java

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
BSD 2-Clause "Simplified" License
2.29k stars 272 forks source link

HTML to Markdown omits table rows with horizontal header #511

Closed p10trk closed 2 years ago

p10trk commented 2 years ago

When converting a HTML table with a horizontal header to Markdown, the row containing the header is omitted.

Please provide as much information about where the but is located or what you were using:

Using flexmark-java version 0.64.0. From pom.xml:

        <dependency>
            <groupId>com.vladsch.flexmark</groupId>
            <artifactId>flexmark-html2md-converter</artifactId>
            <version>0.64.0</version>
        </dependency>

To Reproduce

Run the following Java test:

    @Test
    public void testMarkdownTableWithColumnAndRowHeaders() {
        String html = "<table>\n" +
                "<tr>\n" +
                "<td></td>\n" +
                "<th>Name</th>\n" +
                "<th>Age</th>\n" +
                "<th>Weight</th>\n" +
                "</tr>\n" +
                "<tr>\n" +
                "<th>Dog</th>\n" +
                "<td>Odie</td>\n" +
                "<td>5</td>\n" +
                "<td>5</td>\n" +
                "</tr>\n" +
                "<tr>\n" +
                "<th>Cat</th>\n" +
                "<td>Garfield</td>\n" +
                "<td>10</td>\n" +
                "<td>5</td>\n" +
                "</tr>\n" +
                "</table>";

        String markdown = FlexmarkHtmlConverter.builder()
                .build()
                .convert(html);
        System.out.println(markdown);
        assertTrue(markdown.contains("Name"));
        assertTrue(markdown.contains("Garfield"));
    }

Expected behavior

The resulting Markdown should be as follows:

|-----|----------|-----|--------|
|     | Name     | Age | Weight |
| Dog | Odie     | 5   | 5      |
| Cat | Garfield | 10  | 5      |

Resulting Output It is best to provide one of the following (in decreasing order of value):

The actual Markdown is:

|---|------|-----|--------|
|   | Name | Age | Weight |
p10trk commented 2 years ago

I just discovered the option IGNORE_TABLE_HEADING_AFTER_ROWS that controls this behavior. The problem went away after initializing the converter like this:

DataHolder options = new MutableDataSet()
                .set(IGNORE_TABLE_HEADING_AFTER_ROWS, false)
                .toImmutable();
FlexmarkHtmlConverter.builder(options);