microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
https://github.com/microcosm-cc/bluemonday
BSD 3-Clause "New" or "Revised" License
3.12k stars 176 forks source link

How to get tel: links to not be removed? #152

Closed clarencefoy closed 1 year ago

clarencefoy commented 1 year ago

I have tried:

bMN.AllowStandardURLs()
bMN.AllowAttrs("href").OnElements("a")
bMN.AllowAttrs("href").Matching(regexp.MustCompile(`tel:`)).OnElements("a")

But to no avail. If someone submits a link that's simply "tel:123456", how to make sure the href attribute does not get deleted?

Many thanks

buro9 commented 1 year ago

https://go.dev/play/p/Z2PWy4-1rQx

package main

import (
    "fmt"

    "github.com/microcosm-cc/bluemonday"
)

func main() {
    // Do this once for each unique policy, and use the policy for the life of the program
    // Policy creation/editing is not safe to use in multiple goroutines
    p := bluemonday.UGCPolicy()
    p.AllowURLSchemes(`tel`)

    // The policy can then be used to sanitize lots of input and it is safe to use the policy in multiple goroutines
    html := p.Sanitize(
        `<a href="tel:+44.12345678">Call me</a> or <a href="https//example.org">Check my website</a>`,
    )

    // Output:
    // <a href="tel:+44.12345678" rel="nofollow">Call me</a> or <a href="https//example.org" rel="nofollow">Check my website</a>
    fmt.Println(html)
}
buro9 commented 1 year ago

Ultimately the UGCPolicy() is building the allow list, and that is calling a helper called AllowStandardURLs within helpers.go:L128.

The helper itself has this:

// AllowStandardURLs is a convenience function that will enable rel="nofollow"
// on "a", "area" and "link" (if you have allowed those elements) and will
// ensure that the URL values are parseable and either relative or belong to the
// "mailto", "http", or "https" schemes
func (p *Policy) AllowStandardURLs() {
    // URLs must be parseable by net/url.Parse()
    p.RequireParseableURLs(true)

    // !url.IsAbs() is permitted
    p.AllowRelativeURLs(true)

    // Most common URL schemes only
    p.AllowURLSchemes("mailto", "http", "https")

    // For linking elements we will add rel="nofollow" if it does not already exist
    // This applies to "a" "area" "link"
    p.RequireNoFollowOnLinks(true)
}

And it is the AllowURLSchemes that permits the protocol part of a URL.

That func takes a list of schemes, and they are appended to the allow list.

So what was missing was just calling that once more and allowing tel too.