micromark / micromark-extension-gfm-table

micromark extension to support GFM tables
https://unifiedjs.com
MIT License
6 stars 6 forks source link

Table Content is Including Markdown Header when Between Two Tables #5

Closed pjkaufman closed 2 years ago

pjkaufman commented 2 years ago

Initial checklist

Affected packages and versions

micromark-extension-gfm-table ^2.0.0

Link to runnable example

https://codesandbox.io/s/relaxed-orla-9wkxgi?file=/src/index.js

Steps to reproduce

I am using remark-gfm in order to help parse some markdown, but it seems that it is coming up with some results that I am not expecting.

Here is the markdown I am parsing:

# Table 1
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2
| Column 1 | Column 2 |
|----------|----------|
| foo      | bar      |
| baz      | qux      |
| quux     | quuz     |

New paragraph. I am looking for just the tables present. I expect the results to include 2 tables, but for some reason I am only getting one table output.

Please let me know if I am doing something wrong or if this is the intended functionality.

This is what npm version outputs:

{
  npm: '8.5.5',
  node: '16.15.0',
  v8: '9.4.146.24-node.20',
  uv: '1.43.0',
  zlib: '1.2.11',
  brotli: '1.0.9',
  ares: '1.18.1',
  modules: '93',
  nghttp2: '1.47.0',
  napi: '8',
  llhttp: '6.0.4',
  openssl: '1.1.1n+quic',
  cldr: '40.0',
  icu: '70.1',
  tz: '2021a3',
  unicode: '14.0',
  ngtcp2: '0.1.0-DEV',
  nghttp3: '0.1.0-DEV'
}

Expected behavior

I expect to get 2 tables:

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |

and

| Column 1 | Column 2 |
|----------|----------|
| foo      | bar      |
| baz      | qux      |
| quux     | quuz     |

Actual behavior

The result I get for the table hits in terms of the actual text is:

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2
| Column 1 | Column 2 |
|----------|----------|
| foo      | bar      |
| baz      | qux      |
| quux     | quuz     |

Runtime

Node v16

Package manager

Other (please specify in steps to reproduce)

OS

Linux

Build and bundle tools

Rollup

wooorm commented 2 years ago

Thanks for the repro! Might be a bit before I myself get to it unfortunately.

github-actions[bot] commented 2 years ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

wooorm commented 2 years ago

whoops, didn’t mean to close!

pjkaufman commented 2 years ago

@wooorm , I just tried making some tests just to see if I could get it to fail and I am having some trouble doing so. It seems to work just fine for me when it generates the html. Is there a way I can check that the AST is being properly generated in this repo or is that a separate repo? Here are the changes I have attempted to make it fail: https://github.com/pjkaufman/micromark-extension-gfm-table/blob/main/test/fixtures/back-to-back.md and https://github.com/pjkaufman/micromark-extension-gfm-table/blob/main/test/fixtures/back-to-back.html

Did I do something wrong with my tests? Would the issue lie in this repo then? https://github.com/syntax-tree/mdast-util-gfm-table

Edit: I would add an issue if I could recreate the problematic AST there.

pjkaufman commented 2 years ago

Seems that https://github.com/syntax-tree/mdast-util-gfm-table by itself does give the expected AST.

import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfmTable} from 'micromark-extension-gfm-table'
import {gfmTableFromMarkdown} from 'mdast-util-gfm-table'

console.log(fromMarkdown('# Tables that are back to back with content in between\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| foo      | bar      | blob     |\n| baz      | qux      | trust    |\n| quux     | quuz     | glob     |\n# Table 2 without Pipe at Start and End\n| Column 1 | Column 2 | Column 3 |\n| -------- | -------- | -------- |\nbar | baz | baq\nfoo | bar | bun\n', {
  extensions: [gfmTable],
  mdastExtensions: [gfmTableFromMarkdown]
}));

results in an AST with 2 headers and 2 tables which is expected.

pjkaufman commented 2 years ago

I have narrowed down the problem to having remak-gfm v3.0.1 imported based on the example. I am not certain what to do about this. But in the meantime, I will try to remove remark-gfm and just use the underlying packaged directly if possible. But I am not sure what on remark-gfm causes the problem.

wooorm commented 2 years ago

Do you only get the error on codesandbox? In that case, there is a note at the top of the codesandbox saying to reinstall deps: unfortunately codesandbox has aggressive caching which prevents patches from coming through otherwise.

pjkaufman commented 2 years ago

Locally, I get the issue with remark-gfm. I am still trying to determine if I get the issue with the underlying packages locally.

wooorm commented 2 years ago

Can you throw away your package-lock.json, yarn.lock, and node_modules, reinstall, and try again?

pjkaufman commented 2 years ago

I am working on that locally. It is something I tried yesterday and it did not work. However I did not totally remove remark at that time.

pjkaufman commented 2 years ago

I removed remark-gfm and imported

"mdast-util-gfm": "^2.0.1",
"micromark-extension-gfm": "^2.0.1",

Now it works just fine.

pjkaufman commented 2 years ago

Any idea why this might be?

pjkaufman commented 2 years ago

The only weird table I am seeing is the following:

| Column 1 | Column 2 |
:-: | -----------:
bar | baz
foo | bar
New paragraph.

Is this supposed to be one markdown table?

pjkaufman commented 2 years ago

This one seems to be a bug in the markdown parser, but the issue with including the header is specific to including remark-gfm.

pjkaufman commented 2 years ago

I can take a look and see about fixing the above issue around including a paragraph that starts right after the table ends since it does affect me in what I am doing. However, I am not sure what to do about remark-gfm. Maybe it has a dependency issue where it does not require the right dependencies?

pjkaufman commented 2 years ago

If I am reading the docs correctly for tables, the above scenario that I pointed out is not valid: https://github.github.com/gfm/#tables-extension- It may also not be valid for a single column table, but that is unclear from the example.

pjkaufman commented 2 years ago

@wooorm, is it possible that updating https://github.com/remarkjs/remark-gfm/blob/main/package.json#L41-L42 to be 2.0.1 would fix the issue?

wooorm commented 2 years ago

Ehh, that’s a lot of comments. Hold on a second for me to read through things.

No, that does not solve anything. However, throwing away node_modules, package-locks, etc, will solve things.

pjkaufman commented 2 years ago

Sorry. I tend to comment my train of thought, so I remember what I have done and so other people know what I have done. That does lead to a decent bit of comment.

Just importing the latest of remark-gfm seems to cause the issue for me. It seems to downgrade something or add something that causes the issue with not respecting headers between tables, but I am not certain of this.

wooorm commented 2 years ago

The code you pasted above (included here as:)

# Table 1
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2
| Column 1 | Column 2 |
|----------|----------|
| foo      | bar      |
| baz      | qux      |
| quux     | quuz     |

Works fine with a fresh install of this package, using mdast too. The output then is:

# Table 1

| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |

# Table 2

| Column 1 | Column 2 |
| -------- | -------- |
| foo      | bar      |
| baz      | qux      |
| quux     | quuz     |

Showing that it works.

The package.json for that:

{
  "name": "example",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "mdast-util-from-markdown": "^1.2.0",
    "mdast-util-gfm": "^2.0.1",
    "mdast-util-to-markdown": "^1.3.0",
    "micromark-extension-gfm": "^2.0.1"
  }
}

example.js:

import fs from "node:fs/promises";
import { fromMarkdown } from "mdast-util-from-markdown";
import { toMarkdown } from "mdast-util-to-markdown";
import { gfm } from "micromark-extension-gfm";
import { gfmFromMarkdown, gfmToMarkdown } from "mdast-util-gfm";

const doc = await fs.readFile("example.md");

const tree = fromMarkdown(doc, {
  extensions: [gfm()],
  mdastExtensions: [gfmFromMarkdown()],
});

console.log(tree);

const out = toMarkdown(tree, { extensions: [gfmToMarkdown()] });

console.log(out);

Using the same set up, with the code from your codesanbox, also works. Changing the example.md to:

# Tables that are back to back with content in between

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2 without Pipe at Start and End
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
bar | baz | baq
foo | bar | bun

Yields:

# Tables that are back to back with content in between

| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |

# Table 2 without Pipe at Start and End

| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| bar      | baz      | baq      |
| foo      | bar      | bun      |
pjkaufman commented 2 years ago

However using remark-gfm with remark does not.

wooorm commented 2 years ago

No, it works fine...

package.json:

{
  "name": "example",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "rehype-stringify": "^9.0.3",
    "remark-gfm": "^3.0.1",
    "remark-parse": "^10.0.1",
    "remark-rehype": "^10.1.0",
    "unified": "^10.1.2"
  }
}

example.md:

# Tables that are back to back with content in between

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2 without Pipe at Start and End
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
bar | baz | baq
foo | bar | bun

example.js:

import fs from "node:fs/promises";
import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkGfm from 'remark-gfm'
import remarkRehype from 'remark-rehype'
import rehypeStringify from 'rehype-stringify'

const file = await unified()
  .use(remarkParse)
  .use(remarkGfm)
  .use(remarkRehype)
  .use(rehypeStringify)
  .process(await fs.readFile("example.md"))

console.log(String(file))
<h1>Tables that are back to back with content in between</h1>
<table>
<thead>
<tr>
<th>Column 1</th>
<th>Column 2</th>
<th>Column 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>foo</td>
<td>bar</td>
<td>blob</td>
</tr>
<tr>
<td>baz</td>
<td>qux</td>
<td>trust</td>
</tr>
<tr>
<td>quux</td>
<td>quuz</td>
<td>glob</td>
</tr>
</tbody>
</table>
<h1>Table 2 without Pipe at Start and End</h1>
<table>
<thead>
<tr>
<th>Column 1</th>
<th>Column 2</th>
<th>Column 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>bar</td>
<td>baz</td>
<td>baq</td>
</tr>
<tr>
<td>foo</td>
<td>bar</td>
<td>bun</td>
</tr>
</tbody>
</table>
wooorm commented 2 years ago

As for your example of:

| Column 1 | Column 2 |
:-: | -----------:
bar | baz
foo | bar
New paragraph.

That is parsed the same on GitHub:

Column 1 Column 2
bar baz
foo bar

New paragraph.

As with unified:

<table>
<thead>
<tr>
<th align="center">Column 1</th>
<th align="right">Column 2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">bar</td>
<td align="right">baz</td>
</tr>
<tr>
<td align="center">foo</td>
<td align="right">bar</td>
</tr>
<tr>
<td align="center">New paragraph.</td>
<td align="right"></td>
</tr>
</tbody>
</table>
pjkaufman commented 2 years ago

That seems interesting to me. I guess the markdown rendering I was using did not fully commit to github style for that case.

wooorm commented 2 years ago

You are perhaps using pnp. Or something else. Make sure to remove everything, update everything, if using pnp to update the global cache, and try again.

I cannot reproduce.

pjkaufman commented 2 years ago

Here is the code example for the issue:

import { visit } from "unist-util-visit";
import { fromMarkdown } from "mdast-util-from-markdown";
import { gfm } from "micromark-extension-gfm";
import { gfmFromMarkdown } from "mdast-util-gfm";

const sourceMarkdown =
  "# Tables that are back to back with content in between\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| foo      | bar      | blob     |\n| baz      | qux      | trust    |\n| quux     | quuz     | glob     |\n# Table 2 without Pipe at Start and End\n| Column 1 | Column 2 | Column 3 |\n| -------- | -------- | -------- |\nbar | baz | baq\nfoo | bar | bun\n";

const ast = fromMarkdown(sourceMarkdown, {
  extensions: [gfm()],
  mdastExtensions: [gfmFromMarkdown()]
});
console.log(ast);
let positions = [];
visit(ast, "table", (node) => {
  positions.push(node.position);
});

// Sort positions by start position in reverse order
positions.sort((a, b) => b.start.offset - a.start.offset);
for (const position of positions) {
  console.log(
    sourceMarkdown.substring(position.start.offset, position.end.offset)
  );
}

Seems to yield the following:


| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |
# Table 2 without Pipe at Start and End
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
bar | baz | baq
foo | bar | bun 

However I expect the following:

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| foo      | bar      | blob     |
| baz      | qux      | trust    |
| quux     | quuz     | glob     |

and

| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
bar | baz | baq
foo | bar | bun 
{
  "name": "remark test",
  "version": "1.0.0",
  "private": true,
  "main": "src/index.js",
  "scripts": {
    "start": "parcel index.html --open",
    "build": "parcel build index.html"
  },
  "dependencies": {
    "mdast-util-from-markdown": "1.2.0",
    "mdast-util-gfm": "2.0.0",
    "micromark-extension-gfm": "2.0.0",
    "remark-gfm": "3.0.1"
  },
  "devDependencies": {
    "@babel/core": "^7.0.0",
    "parcel": "^2.0.0"
  }
}

Is using the above able to reproduce the issue on your end or does it still work for you (it may need changes to work locally for you)? If so, I am fine with this being closed.

pjkaufman commented 2 years ago

Note that this probably belongs on the repo for the ast, since it is having the issue rather than the creation of the html.

pjkaufman commented 2 years ago

Here is the example on Codesandbox: https://codesandbox.io/s/relaxed-orla-9wkxgi?file=/src/index.js

wooorm commented 2 years ago

Open up your dependencies:

    "mdast-util-from-markdown": "^1.2.0",
    "mdast-util-gfm": "^2.0.1",
    "micromark-extension-gfm": "^2.0.1",
    "unist-util-visit": "^4.1.0"
pjkaufman commented 2 years ago

Open up your dependencies:

    "mdast-util-from-markdown": "^1.2.0",
    "mdast-util-gfm": "^2.0.1",
    "micromark-extension-gfm": "^2.0.1",
    "unist-util-visit": "^4.1.0"

Why would they need opening up?

wooorm commented 2 years ago

Update your dependencies