mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
785 stars 146 forks source link

Fix leave callback not called for HTML block comment. #203

Closed step- closed 8 months ago

step- commented 8 months ago

For the specific case of a one-line block comment (HTML block type 2) not followed by a blank line[1], MD4C fails to call the leave callback. Consequently, if another HTML comment follows the first comment, both end up being combined into a single block instead of staying as two separate blocks, each with its own callback.

[1] In my comments to issue https://github.com/mity/md4c/issues/200 I remarked that the spec doesn't require a blank line to close a type 2 HTML block.

DETAILS

I can't think of a simple way to demonstrate this issue using md2html alone, precisely because the lack of something can't be shown. Feeding md2html the following markdown:

<!-- C1 -->
<!-- C2 -->

outputs the input text so one would be inclined to think that everything is correct. However, what can't be seen is that there is no callback between lines C1 and C2, while there should be one. Instead, a callback after line C2 wrongly combines the two single-line blocks into a two-line block. Trace the code or add printf statements as needed to see the issue at work.

This commit fixes the problem by setting ctx->html_block_type to a negative value as a special way to flag this end-of-block case.

step- commented 8 months ago

Fixes #202.

mity commented 8 months ago

I'm afraid, this PR goes in bad direction because it wrongly assumes a HTML comment never occupies more than one line.

Consider e.g. this:

<!--
comment 1
-->
<!--
comment 2
-->

I should commit a proper fix shortly.

step- commented 8 months ago

I'm pretty sure it doesn't.

step- commented 8 months ago

I'm convinced it doesn't. The value of html_block_type is negated only in the statement that concerns an HTML block comment that starts and ends on the same line.

mity commented 8 months ago

It doesn't. The value of html_block_type is negated only in the statement that concerns an HTML block comment that starts and ends on the same line.

Proper fix is not to merge multiple blocks no matter whether they occupy one or multiple lines.

step- commented 8 months ago

As I wrote before, the change in this PR can't apply to multi-line comments. Multi-line comments are treated elsewhere in your code. For single line comments, I believe this fix is proper, and I believe the change doesn't spill over multi-line comments.

Perhaps you mean you know that two consecutive multi-line comments miss the intermediate callback? That I don't know but I can easily verify...

mity commented 8 months ago

Perhaps you mean you know that two consecutive multi-line comments miss the intermediate callback? That I don't know but I can easily verify...

That's what I mean. 319631f67e3ab44ec590bf66e424436a5f8dff1e works in that case too, and imho is also less hacky.

step- commented 8 months ago

Now I understand you. Meanwhile, I verified that the callback is missing between multi-line comments. I welcome the change.

step- commented 8 months ago

Yes, I tested 319631f with my test suite and it works. Thank you.