microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.08k stars 178 forks source link

Way to skip html escaping code blocks? #160

Open ivanjaros opened 1 year ago

ivanjaros commented 1 year ago

I have a use case where I take user input, apply strict policy to escape any html(all input is considered plain text), run it through markdown parser and then via custom bluemonday policy to strip any html tags from markdown generated code that i do not want to support.

Now what I need is to tell bluemonday to NOT escape input into html entities when it is being wrapped by ``` or ` because it will be rendered by the markdown parser into syntax-highlighted and <pre> or <code> wrapped blocks.

Right now it seems that I have to insert one step after the strict BM policy and the MD parser and unescape these blocks manually.

ivanjaros commented 1 year ago

After quite some time with experimenting I came to this solution which I am really not liking:


    var (
        rawCodeToken = regexp.MustCompile("`[^`]+`")
        rawCodeBlock = regexp.MustCompile("(```)([\\w\\W]*?)(```)")
    )

    inlineCodes := make([]string, 0, 10)
    input = rawCodeToken.ReplaceAllStringFunc(input, func(found string) string {
        inlineCodes = append(inlineCodes, found)
        return "+code:inline+"
    })

    blockCodes := make([]string, 0, 10)
    input = rawCodeBlock.ReplaceAllStringFunc(input, func(found string) string {
        blockCodes = append(blockCodes, found)
        return "+code:block+"
    })

    input = r.inSanit.Sanitize(input)

    for k := range inlineCodes {
        input = strings.Replace(input, "+code:inline+", inlineCodes[k], 1)
    }

    for k := range blockCodes {
        input = strings.Replace(input, "+code:block+", blockCodes[k], 1)
    }