jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.16k stars 250 forks source link

ASP.NET web forms - MINOR ISSUE - some attribute values NOT html encoded #125

Open CJCannon opened 11 years ago

CJCannon commented 11 years ago

Before:

<script src="/CMIS/ScriptResource.axd?d=mK3-Um43ueXqJjfszyKSDzL4eFh9qFd5vjzSLcveAnQaYO12uHzJ8ksC6FJQvbfXf7ing8-sH5j2OXlWR-QnD2-0TcCcPXjYR4l-pp4D_WyCe0-BrmztjoXNFJduPsSdEm1YOlt6riiC_2DIy2NtVk4CSKJlYS9kHNtcecmBBmgYwHSEidFeVLj9p0HV-6cS0&amp;t=47727e12" type="text/javascript"></script>

Using CsQuery with ASP.NET web forms:

<script src="/CMIS/ScriptResource.axd?d=mK3-Um43ueXqJjfszyKSDzL4eFh9qFd5vjzSLcveAnQaYO12uHzJ8ksC6FJQvbfXf7ing8-sH5j2OXlWR-QnD2-0TcCcPXjYR4l-pp4D_WyCe0-BrmztjoXNFJduPsSdEm1YOlt6riiC_2DIy2NtVk4CSKJlYS9kHNtcecmBBmgYwHSEidFeVLj9p0HV-6cS0&t=47727e12" type="text/javascript"></script>

The ampersand near the end of the src value is not HTML encoded.

Note the script element is automatically added by ASP.NET web forms and is in the body (just after default form and viewstate) NOT the head.

Many thanks.

jamietre commented 11 years ago

So is CsQuery rendering the quoted ampersand as just an ampersand? Will check it out - this seems like one of those things that probably browsers don't care about but is technically incorrect and should be fixed.

CJCannon commented 11 years ago

The ampsersand is NOT being html encoded, it should be &amp; not just &. All attribute values of html elements should be html encoded. It's a minor thing but it helps in client-side debugging if all little errors like this are eliminated.

benjamingr commented 11 years ago

@jamietre This might not seem important but not encoding HTML entities in attributes can lead to XSS. While it does not seem critical to users like me who do scraping - to users like you who use CsQuery in MVC to render code clients receive it seems like a pretty important fix.

jamietre commented 11 years ago

Point taken. The spec describes quoting rules what I had implemented before: http://dev.w3.org/html5/spec-LC/syntax.html#attributes-0

I have tentatively implemented it to use attribute encoding per MS HtmlAttributeEncode whenever the double-quoting rule criteria are met, unless the only character requiring quoting is a double-quote, in which case I use a single-quote as the quoting character.

If you know of a definitive reference for when attribute encoding must be used let me know.