yuin / goldmark

:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.
MIT License
3.68k stars 255 forks source link

Blockquote tag appears after multiline HTML comments not ending with newline #274

Closed cyrus-and closed 2 years ago

cyrus-and commented 2 years ago
  1. What version of goldmark are you using? v1.4.4
  2. What version of Go are you using? 1.17
  3. What operating system and processor architecture are you using? Darwin 20.6.0 x86_64
  4. What did you do?

    package main
    
    import (
        "bytes"
        "fmt"
        "github.com/yuin/goldmark"
    )
    
    func main() {
        input := []byte("<!--\n-->")
        var buffer bytes.Buffer
        if err := goldmark.Convert(input, &buffer); err != nil {
            panic(err)
        }
        fmt.Print(buffer.String())
    }
  5. What did you expect to see?
    <!-- raw HTML omitted -->

    or

    <!--
    -->

    or an empty output, certainly not a <blockquote> tag.

  6. What did you see instead?
    <!-- raw HTML omitted -->
    <!-- raw HTML omitted -->
    <blockquote>
    </blockquote>
  7. Did you confirm your output is different from CommonMark online demo or other official renderer correspond with an extension? Yes:
    <!--
    -->
teekennedy commented 2 years ago

Just ran into this myself while writing unit tests for my markdown renderer for goldmark.

I've narrowed it down to the following conditions:

I believe this is caused by an off-by-one error in advancing the source when parsing an HTML block on the last line of the document. It advances up to but not including the final character >, before closing the HTML block node. The parser continues parsing > as another block, generating an empty blockquote node in the process.

Until this is fixed, one can prevent this bug by always ensuring that your markdown document ends with a trailing newline before sending it to goldmark for parsing. It's good unix practice anyway.

ivanspasov99 commented 1 year ago

@yuin Ran into the same issue but with pure html

md file with raw html

<pre>
hello
<code>
some code
<code>
<pre>

Receiving at the end "blockquote" tags

Maybe I did not understand how I should workaround it, did not find nothing within the renderer options

teekennedy commented 1 year ago

It's possible this has something to do with the fact that there's no closing tags in your HTML example.

Can you post the output of calling ast.Dump on the parsed source?