russross / blackfriday

Blackfriday: a markdown processor for Go
Other
5.43k stars 598 forks source link

Markdown file with CRLF line endings will cause wrong-style output #423

Open jinliming2 opened 6 years ago

jinliming2 commented 6 years ago

Hi, When I started to use blackfriday to parse my Markdown file, I got a wrong-style output. My Markdown file looks like this, with Windows-style CRLF (\r\n) line endings:

# Hello World

This is my content.

And I wrote the code like this in my project:

func main() {
    file, err := ioutil.ReadFile("./test.md")
    if err != nil {
        println(err)
    }
    println(string(file))
    println("---------------------------------------------")
    out := blackfriday.Run(file)
    println(string(out))
}

And now, when I ran my program, I got output in my console like this:

# Hello World

This is my content.

---------------------------------------------
</h1>ello World

<p>
</p> is my content.

Is this a problem in blackfriday? Thanks.

klingtnet commented 6 years ago

I can confirm that text with Windows line endings (CLRF) is not handled correctly. My current workaround is to replace them before passing it to blackfriday:

markdownWithUnixLineEndings := strings.Replace(markdown, "\r\n", "\n", -1)
blackfriday.Run([]byte(markdownWithUnixLineEndings))
rtfb commented 6 years ago

Confirmed here as well. I have a WIP fix, but I'm not spending much time on Blackfriday lately, so haven't cleaned it up yet.

rtfb commented 6 years ago

Submitted #428, feel free to review and comment.

guillep2k commented 4 years ago

For those that need a solution and can't wait for the PR to come up, this is what we did at Gitea to make it work in the meantime: go-gitea/gitea#8925

Basically, we convert every \r or \r\n sequence to \n. We did some tests and we arrived to that algorithm as the fastest (save modifying the string in-situ, which is even faster).

zeripath commented 4 years ago

Here's the actual code:

https://github.com/go-gitea/gitea/blob/dc8036dcc680abab52b342d18181a5ee42f40318/modules/util/util.go#L68-L102

It just rips out all \r\n and \r replacing them with \n - so if for some perverse reason you actually intend there to be a raw \r in your markdown page it will become a newline - however it is fast.

zeripath commented 4 years ago

If you would prefer it replacing in place - then remove the definition of tmp and replace all references to tmp with input or vice versa. That would do it.

TACIXAT commented 4 years ago

I just do a bytes.Replace on the input, but this had me confused as for an hour or so.