markdown-it-rust / markdown-it

markdown-it js library rewritten in rust
Other
79 stars 9 forks source link

HtmlBlock parsing incorrect if empty line exists #12

Closed kurotych closed 1 year ago

kurotych commented 1 year ago

HtmlBlock parsing is incorrect if empty line exists in HTML.

Correct parsing. One HTML block as expected

input

<dev>
    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>
    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>
</dev>

output (ok)

Content: HtmlBlock { content: "<dev>\n    <p>\n    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.\n    </p>\n    <p>\n    Metus sapien molestie cursus s
ollicitudin vivamus dignissim condimentum pretium velit.\n    </p>\n</dev>\n" }

Empty line added

input

<dev>
    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>

    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>
</dev>

output (Incorrect )

Content: HtmlBlock { content: "<dev>\n    <p>\n    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.\n    </p>\n" }
Content: HtmlBlock { content: "</dev>\n" }
use markdown_it::parser::block::builtin::BlockParserRule;
use markdown_it::parser::core::CoreRule;
use markdown_it::parser::inline::builtin::InlineParserRule;
use markdown_it::plugins::extra::syntect::SyntectRule;
use markdown_it::plugins::html::html_block::HtmlBlock;
use markdown_it::{MarkdownIt, Node};

pub struct SyntaxPosRule;
impl CoreRule for SyntaxPosRule {
    fn run(root: &mut Node, _: &MarkdownIt) {
        root.walk_mut(|node, _| {
                if let Some(ss) = node.node_value.as_any().downcast_ref::<HtmlBlock>() {
                    println!("Content: {:?}", ss);
                }
        });
    }
}
fn add(md: &mut MarkdownIt) {
    md.add_rule::<SyntaxPosRule>()
        .after::<BlockParserRule>()
        .after::<InlineParserRule>()
        .after::<SyntectRule>();
}

fn main() {
    let html = r#"
<dev>
    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>

    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>
</dev>
"#;

    let mut parser = markdown_it::MarkdownIt::new();
    markdown_it::plugins::cmark::add(&mut parser);
    markdown_it::plugins::html::add(&mut parser);
    add(&mut parser);
    parser.parse(html);
}
rlidwka commented 1 year ago

Empty line closes html block, as per 4.6. condition 6 of the CommonMark spec:

Please check reference parser if it has the same behavior:

This parser follows commonmark standard, so I believe it is not a bug. If you think this case can be improved, please open a ticket in their forum or issue tracker (I don't think this can be improved though):

kurotych commented 1 year ago

Hello, @rlidwka. Thank you for your answer. You are right, parser behavior acts due to mark spec standard.

Please check reference parser if it has the same behavior:

https://spec.commonmark.org/dingus/

It marks second paragraph as code, looks strange image

Result

<dev>
    <p>
    Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
    </p>
<pre><code>&lt;p&gt;
Metus sapien molestie cursus sollicitudin vivamus dignissim condimentum pretium velit.
&lt;/p&gt;
</code></pre>
</dev>
rlidwka commented 1 year ago

Parser looks at this and sees 3 blocks separated by a blank line at the top level.

First block is a header "hi".

Second block is an unclosed html <dev> tag, which parser has no means of validating.

Third block starts with 4 spaces, which automatically turns it into a code block (then it encounters </dev>, which is not indented, so it is turned into 4th top-level block).

Works as intended so far.

kurotych commented 1 year ago

Thanks for your answers