servo / html5ever

High-performance browser-grade HTML5 parser
Other
2.1k stars 215 forks source link

XML5: incentive well-formed XHTML5: Output parsing errors on demand & provide best processing time for flawless code #294

Open Darkspirit opened 7 years ago

Darkspirit commented 7 years ago

People are using xhtml because they expect to get a Yellow Screen of Death if code is wrong are interested in good code quality.

I followed the XML5 discussion and saw arguments to use its error corrections for broken RSS feeds and broken SVG inside html and such things, which would be useful for the lax html world.

Currently, there can't be broken xhtml files on the web because Firefox is strict. Servo would make incentives to write bad xhtml, if the current behavior would be kept (to conceal faults).

SimonSapin commented 7 years ago

"XML 5 strict mode" doesn’t make sense. If you want XML 1 error handling, you need to use an XML 1 parser. Others already exist in Rust. To make one with an API similar to html5ever and xml5ever, we could make a third crate alongside these two that would also use the markup5ever crate.

Ygg01 commented 7 years ago

https://fastaim.de/test2/ has a Yellow Screen of Death in Firefox, but not in Servo, for example.

Things to note:

IMO it makes more sense to compete with Chrome/IE in this instance.

Currently, there can't be broken xhtml files on the web because all browsers are strict. Servo would make incentives to write bad xhtml, if the current behavior would be kept.

As I pointed out above, Chrome already isn't strict. Chrome is majority of browser market on mobile and PC.

Let's say we adopt this policy. We continue erroring out. The things that will happen:

Right now the bigger incentives for writing bad xhtml are probably Chrome and IE are doing it and/or people forgot about it.

Could you/we introduce some small ifs & elses like "if strict parsing mode is active, stop here, otherwise do the following XML5 error corrections: [your existing code]"?

That's a lot to ask for. I guess, you could go 60% the way there by having a treebuilder ignore/panic on any ParseError.

Darkspirit commented 7 years ago

Chrome displaying this page correctly.

Oh, this must be relatively new (some months?) and I haven't found a bug about it so far. Maybe there is still a chance to intervene (if it would make sense).

Let's say we adopt this policy. We continue erroring out. The things that will happen:

  • People will notice Firefox errors instead of displaying the page
    • People will notice Chrome doesn't
    • People will move to Chrome
    • Firefox will be less relevant as result

This is what we fear. Firefox still has a notable market share and web developers are testing their sites against it. And Firefox 57+ should be a strong push in market share. So this issue would be more about keeping a behavior.

Right now the bigger incentives for writing bad xhtml are probably Chrome and IE are doing it and/or people forgot about it.

I agree with you, even though I must say that one should not underestimate the importance of Firefox for web developers wanting to have a working web site in all browsers.

Let's think it differently:

Can we say the following about the current XML5 parser?

If yes, we would have a strong argument to close this issue. It would be interesting to see in future Firefox/Servo Developer Tools then, if error corrections have been applied or not. If no: Would it be possible to introduce such a behavior to have the best of both worlds (never show an error + well-formed & lightweight xml is faster)?

Ygg01 commented 7 years ago

@TerraX-net :

Can we say the following about the current XML5 parser?

  • If one has well-formed (and let's say "lightweight") XML, everything is finished very early and not influenced by the new error correction features.
  • If one has an error in his XML, the error corrections are applied and everything becomes slightly slower.

Hm, this is a difficult question. I'd need to sit down and write normal XML parsing rules in HTML5 state style. Then see if the rules map 1-on-1 to XML5 rules. Theoretically speaking, it could be true, error, have the small penalty of emitting error (an extra token).

Personally, IMHO best solution is, return partial tree and errors, and possibly show errors in console. If you are user, you aren't bothered by "This page has errors prompt" and if you are developer, you'll note this page has errors, and fix them.

SimonSapin commented 7 years ago

If we’re talking about browsers, it seems like there’s two separate discussions here:

Darkspirit commented 7 years ago

Personally, IMHO best solution is, return partial tree and errors, and possibly show errors in console. If you are user, you aren't bothered by "This page has errors prompt" and if you are developer, you'll note this page has errors, and fix them.

What html5ever and related crates should provide for various Rust projects.

Parse it as far it's possible, but tell what the errors were. Maybe by passing a flag (to don't interfere regular performance) together with the file if one wanted to get an error list (use case: e.g. the Developer Tools are opened).

What Servo should do with Content-Type: application/xhtml+xml documents. This is the wrong forum to have this discussion.

If the XML5 parser has the same speed or might be even faster for a well-written XHTML5 than an oldstyle XML1 parser, I would only see advantages (Browser parity, no frustrated users by a YSoD, speed for good code, visible errors in Developer Tools would blame a developer in front of other devs). So I think it was good to ask you first before asking everyone to introduce even more and apparently unneeded complexity.

Hm, this is a difficult question. I'd need to sit down and write normal XML parsing rules in HTML5 state style. Then see if the rules map 1-on-1 to XML5 rules. Theoretically speaking, it could be true, error, have the small penalty of emitting error (an extra token).

This indicates that the XML5 parser might be that good that it can provide all desired advantages. And if you would find some issues which could get optimized, it would be far better than including additional masses of code.

SimonSapin commented 7 years ago

This is the wrong forum to have this discussion.

Darkspirit commented 7 years ago

This is the wrong forum to have this discussion.

Ygg01 made clear to me that I should rather ask if this XML5 parser

because then I don't wanted to suggest a draconian error handling, but only a feature of this component.