mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
756 stars 138 forks source link

Tables: Number of header columns and delimiter row columns has to match #137

Open mity opened 3 years ago

mity commented 3 years ago

(from https://github.com/mity/md4c/issues/136, to make one report per individual issue)

Example 203: Number of header columns must match the columns in the delimiter row.

| abc | def |
| --- |
| bar |

GFM Spec:

<p>| abc | def |
| --- |
| bar |</p>

MD4C:

<table>
<thead>
<tr>
<th>abc</th>
</tr>
</thead>
<tbody>
<tr>
<td>bar</td>
</tr>
</tbody>
</table>
mity commented 3 years ago

Hummm. This may be very hard and tricky to handle and I need to think more about possible solutions.

The main problem is that it's hard (and maybe impossible) to count the columns at the header line at the time of block analysis: It would require some calling into inline analysis functions to count them because the the line may contain some pipes which are not meaningful for the table (some may be escaped, some may live in a code span inside of some cell, or form a wiki-link).

And at the same time, inline analysis is not ready to be executed until the block analysis of the whole input is complete and we have built a complete dictionary of link reference definitions as a result of it.

So we have a catch-22 situation here.

(Maybe this is the reason why they handle pipes differently inside tables in cmark-gfm, and that e.g. the pipes have to be escaped even when inside a code spans, to allow counting the columns without the need for full inline analysis of the header line.)

mity commented 3 years ago

Also I am wondering whether honoring the rule "Number of header columns must match the columns in the delimiter row." is the right thing to do in general.

Imagine a Markdown editor with a preview feature. When user adds a new column int a table, he then needs to edit two lines, and the preview stupidly has to re-render it all as non-table in the mean time and then again when the 2nd line is edited, it all goes back to the table.

Such behavior is imho user-unfriendly.

And last but not least, how often real documents exhibit the syntax where the two lines do not match: And if they do, is not the intention to render it as a table clear enough?

dominickpastore commented 3 years ago

And last but not least, how often real documents exhibit the syntax where the two lines do not match: And if they do, is not the intention to render it as a table clear enough?

This logic makes sense to me. It's hard to imagine a document that contains syntax like this where the intention isn't to create a table. And the rest of the spec generally allows imperfection when the intent is clear. List numbers need not be consecutive, for example.

Eugenij-W commented 3 years ago

Only delimiter row can determine count of cols, at least as long as the initial delimiter characters in table are optional.

readme.md:

Q: Does MD4C perform any input validation?

A: No. And we are proud of it. :-)

otherwise, colspan's (#147) will become almost impossible.