mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
756 stars 138 forks source link

GFM and table parsing with mismatched title and delimeter rows #157

Closed rundel closed 3 years ago

rundel commented 3 years ago

I've come across a minor discrepancy between the GitHub Flavored Markdown spec and md4c's parsing results for tables with an incomplete delimiter row. Specifically Example 203:

| abc | def |
| --- |
| bar |

According to the GFM spec this should not parse as a table, however currently md4c parses it as a table with a single column (the def column is silently dropped). Pandoc's behavior currently matches md4c's behavior and I'm not sure if there is a definitive source for the table extension spec.

Relatedly if the title row is shorter than the delimiter row then GFM also fails to parse the md as a table while md4c parses the md as a table with the number of columns matching the delimiter row.

| abc 
| --- | --- |
| bar |
mity commented 3 years ago

Yes, this is known.

Unlike (current implementation of) GFM, MD4C does not require code spans nested in the table to use escaped pipes. (See the discussion in https://github.com/mity/md4c/issues/136).

However, that implies MD4C treats the pipe delimiters as inline marks in most of the table lines and cannot use them to determine count of the table columns, except the special header underline. So in MD4C that one is used to determine the count of columns and this results in the observed incompatibility here.

This won't be fixed (unless we rework the table implementations completely to follow their behavior with the code spans, and as noted in the linked issue, I am very reluctant to do so.)

rundel commented 3 years ago

Thanks for the quick response and clarification.