laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
45.88k stars 4.98k forks source link

TABLE with CAPTION or COLGROUP render empty headers #5861

Closed dprothero closed 8 months ago

dprothero commented 2 years ago

This issue was originally reported here but it seems that repo has been archived, and the code now lives here.

Tables such as

<table>
  <caption>My table</caption>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
  </tr>
  <tr>
    <td>Jill</td>
    <td>Smith</td>
  </tr>
</table>

or

<table>
  <colgroup>
    <col style="" />
    <col style="" />
  </colgroup>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
  </tr>
  <tr>
    <td>Jill</td>
    <td>Smith</td>
  </tr>
</table>

Render an empty header, such as

|     |     |
| --- | --- |My table
| Firstname | Lastname |
| Jill | Smith |

COLGROUP should be ignored, CAPTION should be displayed before the table. Expected output should be:

My table
| Firstname | Lastname |
| --- | --- |
| Jill | Smith |
Sukriti-sood commented 2 years ago

I would like to work on this.

laurent22 commented 2 years ago

What are the steps to replicate this? HTML doesn't render to Markdown, that doesn't make sense.

maxpatiiuk commented 9 months ago

Steps to replicate this:

// initialize turndownService with gfm plugin and default settings
const turndownService = new TurndownService();
turndownService.use(gfm);

// Convert table with caption to markdown
const output = turndownService.turndown(`<table>
            <caption>Developer Time (in hours)</caption>
            <tbody><tr>
                <th>Developer</th>
                <th>From Scratch</th>
                <th>Carbon</th>
            </tr>
              <tr>
                <th>Developer One</th>
                <td>4.2 hours from scratch</td>
                <td>1.1 hours using Carbon</td>
              </tr>
            </tbody></table>`)

result:

|     |     |     |
| --- | --- | --- |Developer Time (in hours)
| Developer | From Scratch | Carbon |
| Developer One | 4.2 hours from scratch | 1.1 hours using Carbon |

some websites with tables that result in broken markdown:

maxpatiiuk commented 9 months ago

As a combined workaround for this bug and #9885, you can add this code right after turndownService.use(gfm);:

const tableRule = turndownService.rules.array[2];
if (!tableRule.filter.toString().includes('TABLE'))
  throw new Error('Incorrect rule selected. Expected to find table rule');
tableRule.filter = ['table'];
if(tableRule.replacement?.toString().toLowerCase().includes('caption'))
  throw new Error('Turndown received caption support - this workaround should be removed');
const originalReplacement = tableRule.replacement;
tableRule.replacement = (content, node, ...rest) => {
  const caption = (node as HTMLTableElement).caption?.textContent || '';
  const table = originalReplacement?.(content, node, ...rest) ?? '';
  return caption === '' ? table : `${caption}\n\n${table.trimStart()}`;
};

turndownService.addRule('caption', {
  filter: ['caption', 'colgroup', 'col'],
  replacement: () => '',
});