whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.12k stars 2.67k forks source link

Standardize YouTube embed rewriting? #2390

Open RByers opened 7 years ago

RByers commented 7 years ago

Apparently chromium, Gecko and WebKit all have code for rewriting YouTube flash embeds to use HTML instead (so they work when flash is disabled, such as on mobile):

If this is now effectively required for web compat, perhaps we should work to align our implementations and specify it in HTML just to make this a little more predictable / sane? For automated testing purposes, perhaps we should include in the definition an additional hostname like youtube.web-platform.test?

@mounirlamouri can you share any data on how much of the web benefits from this? I imagine there may be other cases where we want to ask whether this is worth doing. So we may be setting a precedent here. I assume we'd only ever consider something like this when it's has truly massive positive impact on the user experience. I think it's worth having some discussion on what magnitude of usage qualifies (eg. as a fraction of page views, and/or a fraction of HTTP Archive pages).

mounirlamouri commented 7 years ago

According to our data, this is affecting somewhere between 1-3% of our users on desktop and less than that on mobile (probably half). It's not huge because it's mostly for old websites, which is also the reason why mobile is less affected.

Is there any other data you would like to see?

zcorpan commented 7 years ago

From https://nerdydata.com/ ("We index approximately 160 million websites") searching in "Deep Web":

src="//www.youtube.com/v/
Showing 1 - 50 (of approx. 22,090 found)

src="http://www.youtube.com/v/
Showing 1 - 50 (of approx. 281,530 found) 

src="https://www.youtube.com/v/
Showing 1 - 50 (of approx. 151,810 found) 

data="//www.youtube.com/v/
Showing 1 - 50 (of approx. 3,290 found) 

data="http://www.youtube.com/v/
Showing 1 - 50 (of approx. 28,200 found) 

data="https://www.youtube.com/v/
Showing 1 - 50 (of approx. 1,175 found) 

Adding these together and assuming the worst case (1 match is 1 site), that is ~0.3%.

zcorpan commented 7 years ago

For comparison, we have <image>:

   <dt>A start tag whose tag name is "image"</dt>
   <dd>
    <!-- As of 2005-12, studies showed that around 0.2% of pages used the <image> element. -->
    <p><span>Parse error</span>. Change the token's tag name to "img" and reprocess it. (Don't
    ask.)</p>
   </dd>
RByers commented 7 years ago

Thanks guys! As expected, this is pretty compelling data that this is important enough to justify some sort of fix. Here's some HTTP Archive data about the popularity of different embed origins/types (of the top 500k sites). So YouTube is present on ~0.27% of top sites, with histats.com behind that at about a tenth of that at 0.029%.

Rank    Origin  Type    Page count
1   http://www.youtube.com  application/x-shockwave-flash   1087     
2   https://www.youtube.com application/x-shockwave-flash   234  
3   http://s10.histats.com  application/x-shockwave-flash   143  
4   http://player.youku.com application/x-shockwave-flash   102  
5   http://www.xatech.com   application/x-shockwave-flash   98   
6   http:// application/x-shockwave-flash   71   
7   http://www.clocklink.com    application/x-shockwave-flash   69   
8   http://blog.naver.com   application/x-shockwave-flash   66   
9   http://www.dailymotion.com  application/x-shockwave-flash   61   
10  http://img.i2i.jp   application/x-shockwave-flash   49   
11  http://www. application/x-shockwave-flash   49   
12  http://www.smartadserver.com    application/x-shockwave-flash   48   
13  http://home.uchat.co.kr application/x-shockwave-flash   45   
14  http://content.pop6.com application/x-shockwave-flash   43   
15  http://embed.scribblelive.com   application/x-shockwave-flash   41  
16  http://www.ustream.tv   application/x-shockwave-flash   32   
17  http://chat.53kf.com    application/x-shockwave-flash   31   
18  http://www.flickr.com   application/x-shockwave-flash   28   
19  http://static-hw.xvideos.com    application/x-shockwave-flash   27   
20  http://static.video.qq.com  application/x-shockwave-flash   27   
21  http://pic.img80.com    application/x-shockwave-flash   26   
22  http://vimeo.com    application/x-shockwave-flash   26   
23  http://s3.rotaban.ru    application/x-shockwave-flash   24   
24  https://embed-ssl.wistia.com    application/x-shockwave-flash   23   
25  http://s4.rotaban.ru    application/x-shockwave-flash   23   
26  http://h2.flashvortex.com   application/x-shockwave-flash   21   
27  http://www.youtube-nocookie.com application/x-shockwave-flash   19   
28  http://c.brightcove.com application/x-shockwave-flash   19   
29  http://ws.amazon.com    application/x-shockwave-flash   19   
30  http://localtimes.info  application/x-shockwave-flash   19   

Query:

SELECT 
  REGEXP_EXTRACT(body, r'<embed[^>]*src=[\'"]?(http[s]?://[^/ "\']*)') AS embedsrc,
  REGEXP_EXTRACT(body, r'<embed[^>]*type=[\'"]?([^ "\']*)') AS type,
  COUNT(DISTINCT bodies.page) as count
  FROM [httparchive:har.2017_01_15_chrome_requests_bodies] 
  AS bodies  
JOIN (
  SELECT 
    page, url,
    JSON_EXTRACT(payload, '$._contentType') AS contentType
  FROM [httparchive:har.2017_01_15_chrome_requests]
) AS requests ON requests.url=bodies.url AND requests.page=bodies.page
WHERE REGEXP_MATCH(body, r'<embed[^>]*src=[\'"]?http[s]?://')
GROUP BY embedsrc, type
ORDER BY count DESC
LIMIT 5000
rniwa commented 7 years ago

I would be surprised this was required for compatibility.

qdot commented 7 years ago

Our firefox telemetry had it at somewhere around 1% of users also when we implemented embed rewriting. It's now down to about 0.5% on our release branch.

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-01-25&keys=__none__!__none__!__none__&max_channel_version=release%252F51&measure=YOUTUBE_REWRITABLE_EMBED_SEEN&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-01-18&table=0&trim=1&use_submission_date=0

miketaylr commented 7 years ago

I would be surprised this was required for compatibility.

If there's no Flash plugin on the device or platform, this is how you get the embedded videos to keep working, right?

zcorpan commented 7 years ago

Couldn't most of the "rewriting" be done by YouTube instead of in the browser engine, though? That is, if the only hack in the browser engine is type="application/x-shockwave-flash" -> type="text/html" for youtube.com/v/.

(And maybe eventually, do such type rewriting for any URL...?)

RByers commented 7 years ago

Yeah, we've been discussing a bit a new generic web platform feature that could enable YouTube (and everyone else in a similar position) to solve this problem themselves (basically upgrading a flash embed to an iframe). It's not clear to me what the exact requirements would be for YouTube to be able to use such a feature. If people here feel that is worth pursuing, I can see if we can find someone at Google to drive this.

qdot commented 7 years ago

Update: @mounirlamouri has pointed out that flash embeds are now completely deprecated by youtube. We've still got per-browser solutions to rewriting. Should we revisit this?

karlcow commented 1 year ago

Is there a counter somewhere for

<embed src="https://www.youtube.com/…"
       type="application/x-shockwave-flash"
…
gregorypappas commented 11 months ago

Hi @karlcow, use counters for embeds that are eligible for rewriting were added to Firefox a few months ago. See bug 1847295 for more info :slightly_smiling_face:

https://mozilla.github.io/usecounters/#channel=nightly&group=YOUTUBEFLASHEMBED

zcorpan commented 11 months ago

Can Chromium also implement a use counter?

RByers commented 11 months ago

Can Chromium also implement a use counter?

I am embarrassed to see that I didn't add a UseCounter for this years ago. Absolutely, we can not only add a counter, but also make an informed opinion on our ability to remove this logic from chromium. Bug 1503916 assigned to myself, will plan to land the counter today.

Also note that Chromium's impl isn't just about YouTube anymore, but also dailymotion.com and vimeo.com. I assume we'd want to tackle all three at once, right? I'm adding a UseCounter for all such embed rewrites, but if it turns out to be too large to break, then we could always dig in and look at each independently (though perhaps if we're going to spec one, we should just spec them all?).

zcorpan commented 11 months ago

OK, great. As far as I can tell, Gecko only rewrites for Youtube (here). I'll check if we've received web compat issues for dailymotion/vimeo embeds.

RByers commented 11 months ago

The counter has landed but probably too late to request a merge into Chrome 120, so it won't hit stable until Chrome 121 late January. Regardless data from early release channels should start showing up here soon. It's wildly unreliable but I have initial data from Chrome Canary on Windows - on the order of 0.001% of page loads, inline with the Firefox data. I'd guess that means it's too common still for us to just remove, but it's not impossible we could do so with more work. In particular, now that I have a rough idea of magnitude and that this is indeed a candidate for potential removal, I can opt this counter into anonymized URL collection - with any luck we could find >99% of the usage is one or two sites suggesting removal is potentially easy.

If we can't immediately remove it would you want to spec it, or maybe since it's been unspecced for so long we just wait another 6 months or something to get a sense for the trend line?

gregorypappas commented 11 months ago

Individual use counters for dailymotion/vimeo/youtube would be nice. My feeling, based on absolutely nothing, is that unshipping vimeo/dailymotion embed rewriting right now is viable. This would bring all the implementations closer together, allowing for easier specification if necessary.

I personally would like to see this hack completely removed from the web, so +1 for waiting another 6 months or so to get a sense for the trend line.

RByers commented 11 months ago

Regardless of the Usecounter values, I'm personally uncomfortable with Google Chrome having a workaround just for YouTube - even if it is, by far, the largest use. I'd prefer to either remove them all or keep them all in Chromium.

zcorpan commented 11 months ago

Let's see if it's possible to remove all of them, or simplify (maybe treat Flash type as text/html and let the affected sites serve HTML or redirect?).

zcorpan commented 9 months ago

https://chromestatus.com/metrics/feature/timeline/popularity/4740 now shows about 0.002% for December and January. No hits in "Adoption of the feature on top sites".

@RByers

In particular, now that I have a rough idea of magnitude and that this is indeed a candidate for potential removal, I can opt this counter into anonymized URL collection - with any luck we could find >99% of the usage is one or two sites suggesting removal is potentially easy.

Nice, what's the status of this?

RByers commented 9 months ago

I just checked and there's not enough data from Chrome 121 beta to identify any popular origins hitting this regularly enough (need >50 users per day for our privacy policy). Chrome 121 hits stable in early Feb, so hopefully we'll get some hits around mid Feb.

zcorpan commented 8 months ago

@RByers mid-Feb has passed. :) Is there enough data now for anonymized URL collection?

RByers commented 8 months ago

Yep, thanks for the ping. We do indeed have good data now, and I've done an analysis and shared some examples in the chromium issue. The usage appears to have quite a long tail (top 10 origins account only for 2% of the usage we can measure), and there are lots of examples like blogs and forum posts which are unlikely to get fixed but where the user experience would be significantly diminished by breaking the embeds. As a result, I don't believe we can reasonably remove this behaviour from Chromium, sorry! I also don't see any evidence of downward trend in the few months of UseCounter data we now have. So I don't expect this to just resolve with time over the next few years.

So my recommendation is that we standardize this rewrite. Thoughts?

zcorpan commented 7 months ago

Thanks!

I agree. @gregorypappas and I also looked at a few in the "top sites" of https://chromestatus.com/metrics/feature/timeline/popularity/4740 , it looks like there are various education and government sites affected also.

WebKit tried to remove rewriting in https://bugs.webkit.org/show_bug.cgi?id=224453 but brought it back in https://bugs.webkit.org/show_bug.cgi?id=237182 cc @achristensen07 @cdumez

Since Gecko and WebKit seem to be fine with only YouTube rewrite, not Vimeo or Dailymotion, that seems like the minimum to standardize to address the web compat problem. Is that a problem for Chromium even if you can point to the spec?

zcorpan commented 7 months ago

Quick GitHub search to get a sense of how common vimeo/dailymotion embeds are compared to youtube.

RByers commented 7 months ago

Oh nice stats. From my perspective it doesn't make the system really any more complex to have a list of rewrite rules rather than a single one, and there's clearly non-trivial usage of Vimeo too. But if there's no other implementation interested in adding the Vimeo or Dailymotion support, then yeah WHATWG has to just standardize the only thing with multiple-implementor interest.

Chromium will keep the others as non-standard extensions. I just want to go on record saying I don't think it's a great look for the HTML spec to special case a Google property alone, and certainly not something Google would endorse or encourage 😄 . There was a general web compat problem with embedded video when the web evolved to not depend on flash for video, and Chromium is happy to mitigate that compat problem for any video provider with non-trivial usage of their legacy flash embed.

zcorpan commented 7 months ago

Well now you're being reasonable, @RByers! 😄

@gregorypappas thoughts about supporting vimeo and dailymotion rewrites? @achristensen07 @cdumez same for WebKit?

gregorypappas commented 7 months ago

thoughts about supporting vimeo and dailymotion

It wouldn't be hard, and doing so would mildly increase interoperability, so we might as well just do it. As noted in earlier comments, I still oppose this mechanism in principal, and would hate to see these hacks remain in implementations forever. So even when standardized, I think we should always be eyeing the removal of this stuff if usage decreases enough, even if it takes 20 years :)

annevk commented 7 months ago

Does/will everyone at least log some kind of console error similar to synchronous XMLHttpRequest?

RByers commented 1 month ago

Does/will everyone at least log some kind of console error similar to synchronous XMLHttpRequest?

Sorry I missed this. I'll do you one better, in Chromium I'll add a deprecation report (which has an associated console error) so that sites watching for usage of deprecated functionality in Chrome in the wild will get a hit on this. In practice we've found that this (and console errors) don't tend to actually drive a meaningful reduction in usage, but I still think it's the right thing to be doing. With luck there will be a point some years in the future where we can say the web doesn't need this anymore (we're just long past the point of hoping that'll come soon IMHO).

zcorpan commented 1 month ago

Gecko seems to have had a console warning for the YouTube embed rewrite since at least 2016.

https://searchfox.org/mozilla-central/source/dom/locales/en-US/chrome/dom/dom.properties#257

More things we could do:

zcorpan commented 1 month ago

I first thought this could be testable by intercepting with service workers, but per https://w3c.github.io/ServiceWorker/#implementer-concerns it seems that is not the case. Maybe that can be changed now that plugins are no longer a thing?

It is detectable whether a rewrite happened or not, e.g. window.length is different. But I assume we want to test more aspects than "did a rewrite happen".

Adding new hostnames as @RByers suggested (youtube.web-platform.test) is an option, but it seems nicer to be able to test the real URLs.

annevk commented 1 month ago

We shouldn't change how service workers deal with object and embed for the purposes of this feature. There's much more to revamp about how object and embed work and that's tracked by dedicated issues.

window.length being different is a longstanding issue about iframe elements being detectable across the shadow root boundary. I wouldn't want to enshrine that even further by making it impossible to fix due to tests. In fact, I'd kinda lean towards testing the opposite...

domenic commented 1 month ago

Are shadow roots involved here? I thought the rewrite was at the HTML parser level.

annevk commented 1 month ago

WebKit's implementation uses an internal shadow root: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/html/HTMLPlugInElement.cpp#L273-L282 & https://github.com/WebKit/WebKit/blob/main/Source/WebCore/Modules/plugins/YouTubePluginReplacement.cpp#L82-L104

zcorpan commented 1 month ago

We shouldn't change how service workers deal with object and embed for the purposes of this feature. There's much more to revamp about how object and embed work and that's tracked by dedicated issues.

OK, I can file an issue for service workers so object and embed handling can be evaluated on its own. Edit: I commented here https://github.com/w3c/ServiceWorker/issues/249#issuecomment-2363201150

window.length being different is a longstanding issue about iframe elements being detectable across the shadow root boundary. I wouldn't want to enshrine that even further by making it impossible to fix due to tests. In fact, I'd kinda lean towards testing the opposite...

WebKit's implementation uses an internal shadow root: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/html/HTMLPlugInElement.cpp#L273-L282 & https://github.com/WebKit/WebKit/blob/main/Source/WebCore/Modules/plugins/YouTubePluginReplacement.cpp#L82-L104

That WebKit uses an internal shadow root seems like an implementation detail. The object or embed element isn't in a shadow root, and per spec those elements can create a browsing context like iframe. If we want to hide browsing contexts in shadow roots, I think that should be when the author-visible element is in a shadow root.

I can also see an argument to hide YouTube-rewritten elements generally, in that it would more closely emulate the Flash-era behavior (which I assume did not affect window.length).