jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.15k stars 249 forks source link

ASP.NET web forms - DOCTYPE is changed to bogus DOCTYPE #124

Closed CJCannon closed 11 years ago

CJCannon commented 11 years ago

My DOCTYPE is being changed from:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

to

<!DOCTYPE html "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd">

Firefox is reporting this as a bogus DOCTYPE!

Many thanks

jamietre commented 11 years ago

PUBLIC is missing, what version are you using? Try updating to the beta release on nuget. (Choose include prerelease) I think I fixed this.

CJCannon commented 11 years ago

Got 1.3.4 via nuget, I'll try the 1.3.5 beta 5 and let you know.

CJCannon commented 11 years ago

1.3.5 beta 5 works ok thanks - this issue can be closed. Didn't expect the doctype to be changed - so am I right in thinking that CsQuery completely replaces normal ASP.NET web forms rendering?

When's it out of beta?

jamietre commented 11 years ago

It's pretty much release ready (this is mostly a bugfix release, but it has been a while since I pushed out an update) - I have just been extraordinarily busy at work so haven't had a lot of free time to do some cleanup and push it out. I have some time off soon and will get it done then.

CJCannon commented 11 years ago

I will close this issue, I'm just curious about the rendering, does CsQuery completely replace normal ASP.NET rendering? Not saying this is a good or bad thing... so normally a .NET library would write out the HTML but CsQuery replaces that? Can I be sure that it is doing it 99% the same? E.g. didn't expect the doctype to be changed!

jamietre commented 11 years ago

Yes it does - it has to, in order to allow you to alter the HTML. The HTML output which CsQuery intercepts is just a string. So to be able to do this magic it is first completely parsed into a DOM object model which CsQuery lets you manipulate. When you're done, and it's rendered again, it's being completely recreated from the object model. While in theory it might be possible to keep track of the original positions of each entity in the source string, and only regenerate the parts of it that had been altered, this would be a lot more complex internally, and probably a lot slower since it would be necessary to keep a "source map" to the position of each entity in the source stream.

This does have the consequence of the output HTML almost certainly being a little different - since any "errors" like missing closing tags, illegal quoting, etc will be interpreted according to the HTML5 parsing rules, and corrected. All tag names will be upper-case. It will also normalize attribute quoting according to the output rules you've defined.

But usually "different" is better ;)

CJCannon commented 11 years ago

You mean lower-case?

I'm not using HTML5 yet - it is XHTML 1.0 Transitional

On the plus side I noticed tables now come with a tbody!

jamietre commented 11 years ago

Umm... yeah lower case :) Yes - tbody always gets generated. There are some tags that are optional yet mandatory, meaning they are required as part of the DOM but the spec permits omitting them - such as tbody, closing p and li tags. But they are actually mandatory in the final DOM, e.g. a selector table > tr will never return a result in a web browser even if you omit tbody in your source HTML.

CJCannon commented 11 years ago

Ok many thanks for the info.

I think all I'm concerned about is the HTML being valid according to the doctype - I'm hoping that it is not going to add / change bits according to HTML5 and make the HTML invalid according to XHTML 1.0 Transitional?

jamietre commented 11 years ago

I don't think you should ever find this to be the case - while the parser is an HTML5 parser, this logic really only comes into play when dealing with invalid markup. (XML is valid under HTML5 rules). That is, HTML5 defines a set of rules for handling missing tags and badly formed markup. If the input is valid XML then the only changes that should be made, other than basic syntax, would be to the quoting of attributes. (And adding tbody tags). When an XML doctype is used, CsQuery will render tags that don't have a closing tag (e.g. input) correctly as <input></input>

The DOCTYPE bug should be an anomaly since CsQuery really doesn't do any parsing of the tag structures themselves, it's pretty much tag + attributes. DOCTYPE is an exception because it has a specific format and describes functionality.

CJCannon commented 11 years ago

Thanks for your help.

I look forward to the 1.3.5 general release!