Closed mstaack closed 6 years ago
There is nothing wrong with the policy, but the image data is truncated or incomplete.
The data itself is parsed as base64 using https://github.com/microcosm-cc/bluemonday/blob/master/helpers.go#L223-L226
But the input is not valid https://play.golang.org/p/-eq8Xsr8Qy :
package main
import (
"encoding/base64"
"fmt"
)
func main() {
s := `R0lGODdhEAAQAMwAAPj7+FmhUYjNfGuxYYDJdYTIeanOpT+DOTuANXi/bGOrWj6CONzv2sPjv2CmV1unU4zPgISg6DJnJ3ImTh8Mtbs00aNP1CZSGy0YqLEn47RgXW8amasW7XWsmmvX2iuXiwAAAAAEAAQAAAFVyAgjmRpnihqGCkpDQPbGkNUOFk6DZqgHCNGg2T4QAQBoIiRSAwBE4VA4FACKgkB5NGReASFZEmxsQ0whPDi9BiACYQAInXhwOUtgCUQoORFCGt/g4QAIQA7`
if _, err := base64.StdEncoding.DecodeString(s); err != nil {
// this one fires
fmt.Println(err.Error())
} else {
fmt.Println("parsed as base64")
}
}
Were you to provide a valid base64 input, it would work (following example based on the unit test):
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
p := bluemonday.NewPolicy()
p.AllowImages()
p.AllowDataURIImages()
s := `<img src="">`
// this returns the image
fmt.Printf("'%s'\n", p.Sanitize(s))
}
@buro9 thanks for you feedback, will check base64 on my side ;)
@buro9 weird this is my base64 img code: https://play.golang.org/p/O0j5Dmlvmp so its valid.
but my html response has an empty image tag: <img alt="">
this is my go code:
// Define a policy, we are using the UGC policy as a base.
p := bluemonday.UGCPolicy()
// HTML email is often displayed in iframes and needs to preserve core
// structure
p.AllowDocType(true)
// HTML email frequently contains obselete and basic HTML
p.AllowElements("html", "head", "body", "label", "input", "font", "main", "nav", "header", "footer", "kbd", "legend", "map", "title")
p.AllowAttrs("type").Matching(StyleType).OnElements("style")
p.AllowAttrs("style").Globally()
// Custom attributes on elements
p.AllowAttrs("type", "media").OnElements("style")
p.AllowAttrs("face", "size").OnElements("font")
p.AllowAttrs("name", "content", "http-equiv").OnElements("meta")
p.AllowAttrs("name", "data-id").OnElements("a")
p.AllowAttrs("for").OnElements("label")
p.AllowAttrs("type").OnElements("input")
p.AllowAttrs("rel", "href").OnElements("link")
p.AllowAttrs("topmargin", "leftmargin", "marginwidth", "marginheight", "yahoo").OnElements("body")
p.AllowAttrs("xmlns").OnElements("html")
p.AllowAttrs("style", "vspace", "hspace", "face", "bgcolor", "color", "border", "cellpadding", "cellspacing").Globally()
// HTML email tends to see the use of obselete spacing and styling attributes
p.AllowAttrs("bgcolor", "color", "align").OnElements("basefont", "font", "hr", "table", "td")
p.AllowAttrs("border").OnElements("img", "table", "basefont", "font", "hr", "td")
p.AllowAttrs("cellpadding", "cellspacing", "valign", "halign").OnElements("table")
// Allow "class" attributes on all elements
p.AllowStyling()
// Allow images to be embedded via data-uri
p.AllowDataURIImages()
// Add "rel=nofollow" to links
p.RequireNoFollowOnLinks(true)
p.RequireNoFollowOnFullyQualifiedLinks(true)
// Open external links in a new window/tab
p.AddTargetBlankToFullyQualifiedLinks(true)
// Read input from stdin so that this is a nice unix utility and can receive
// piped input
dirty, err := ioutil.ReadAll(os.Stdin)
if err != nil {
log.Fatal(err)
}
// Apply the policy and write to stdout
fmt.Fprint(
os.Stdout,
p.Sanitize(
string(dirty),
),
)
Hmmm, if I take your image example from the linked Go Playground... it works for me. When I use the base64 input within a src="" if does not work. If I remove the line breaks and whitespace... it works for me.
Do you have whitespace or line breaks in your input data URL content?
ahh yeah...okay
but i cant just replace all \n to an empty string
hmm okay fixed the base64 part...
but this still gives an empty image tag... weird
This is a valid bug... I'll fix it :)
Line feeds in data URI attributes in HTML5 are valid, so I should be stripping those out beforehand.
@buro9 your work is awesome! will recheck when your done. thanks!
Thanks... do you have a test file btw... with the full input you are using (for this part of the file)... to avoid me copying it from a screenshot :)
and one with newline: https://gist.github.com/mstaack/599a28114ceecc45e8f5f6d93f7b7d71
both should work
These would still return empty string because image/image/png
is not a valid content-type.
But assuming image/png
is used I'll make the new line one work too.
yeah you are totally right... will fix that on my side ;)
Using this content:
and this policy:
i get an empty string back.... whats wrong with my policy?
cheers max