Open RByers opened 7 years ago
According to our data, this is affecting somewhere between 1-3% of our users on desktop and less than that on mobile (probably half). It's not huge because it's mostly for old websites, which is also the reason why mobile is less affected.
Is there any other data you would like to see?
From https://nerdydata.com/ ("We index approximately 160 million websites") searching in "Deep Web":
src="//www.youtube.com/v/
Showing 1 - 50 (of approx. 22,090 found)
src="http://www.youtube.com/v/
Showing 1 - 50 (of approx. 281,530 found)
src="https://www.youtube.com/v/
Showing 1 - 50 (of approx. 151,810 found)
data="//www.youtube.com/v/
Showing 1 - 50 (of approx. 3,290 found)
data="http://www.youtube.com/v/
Showing 1 - 50 (of approx. 28,200 found)
data="https://www.youtube.com/v/
Showing 1 - 50 (of approx. 1,175 found)
Adding these together and assuming the worst case (1 match is 1 site), that is ~0.3%.
For comparison, we have <image>
:
<dt>A start tag whose tag name is "image"</dt>
<dd>
<!-- As of 2005-12, studies showed that around 0.2% of pages used the <image> element. -->
<p><span>Parse error</span>. Change the token's tag name to "img" and reprocess it. (Don't
ask.)</p>
</dd>
Thanks guys! As expected, this is pretty compelling data that this is important enough to justify some sort of fix. Here's some HTTP Archive data about the popularity of different embed origins/types (of the top 500k sites). So YouTube is present on ~0.27% of top sites, with histats.com behind that at about a tenth of that at 0.029%.
Rank Origin Type Page count
1 http://www.youtube.com application/x-shockwave-flash 1087
2 https://www.youtube.com application/x-shockwave-flash 234
3 http://s10.histats.com application/x-shockwave-flash 143
4 http://player.youku.com application/x-shockwave-flash 102
5 http://www.xatech.com application/x-shockwave-flash 98
6 http:// application/x-shockwave-flash 71
7 http://www.clocklink.com application/x-shockwave-flash 69
8 http://blog.naver.com application/x-shockwave-flash 66
9 http://www.dailymotion.com application/x-shockwave-flash 61
10 http://img.i2i.jp application/x-shockwave-flash 49
11 http://www. application/x-shockwave-flash 49
12 http://www.smartadserver.com application/x-shockwave-flash 48
13 http://home.uchat.co.kr application/x-shockwave-flash 45
14 http://content.pop6.com application/x-shockwave-flash 43
15 http://embed.scribblelive.com application/x-shockwave-flash 41
16 http://www.ustream.tv application/x-shockwave-flash 32
17 http://chat.53kf.com application/x-shockwave-flash 31
18 http://www.flickr.com application/x-shockwave-flash 28
19 http://static-hw.xvideos.com application/x-shockwave-flash 27
20 http://static.video.qq.com application/x-shockwave-flash 27
21 http://pic.img80.com application/x-shockwave-flash 26
22 http://vimeo.com application/x-shockwave-flash 26
23 http://s3.rotaban.ru application/x-shockwave-flash 24
24 https://embed-ssl.wistia.com application/x-shockwave-flash 23
25 http://s4.rotaban.ru application/x-shockwave-flash 23
26 http://h2.flashvortex.com application/x-shockwave-flash 21
27 http://www.youtube-nocookie.com application/x-shockwave-flash 19
28 http://c.brightcove.com application/x-shockwave-flash 19
29 http://ws.amazon.com application/x-shockwave-flash 19
30 http://localtimes.info application/x-shockwave-flash 19
Query:
SELECT
REGEXP_EXTRACT(body, r'<embed[^>]*src=[\'"]?(http[s]?://[^/ "\']*)') AS embedsrc,
REGEXP_EXTRACT(body, r'<embed[^>]*type=[\'"]?([^ "\']*)') AS type,
COUNT(DISTINCT bodies.page) as count
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
AS bodies
JOIN (
SELECT
page, url,
JSON_EXTRACT(payload, '$._contentType') AS contentType
FROM [httparchive:har.2017_01_15_chrome_requests]
) AS requests ON requests.url=bodies.url AND requests.page=bodies.page
WHERE REGEXP_MATCH(body, r'<embed[^>]*src=[\'"]?http[s]?://')
GROUP BY embedsrc, type
ORDER BY count DESC
LIMIT 5000
I would be surprised this was required for compatibility.
Our firefox telemetry had it at somewhere around 1% of users also when we implemented embed rewriting. It's now down to about 0.5% on our release branch.
I would be surprised this was required for compatibility.
If there's no Flash plugin on the device or platform, this is how you get the embedded videos to keep working, right?
Couldn't most of the "rewriting" be done by YouTube instead of in the browser engine, though? That is, if the only hack in the browser engine is type="application/x-shockwave-flash"
-> type="text/html"
for youtube.com/v/
.
(And maybe eventually, do such type
rewriting for any URL...?)
Yeah, we've been discussing a bit a new generic web platform feature that could enable YouTube (and everyone else in a similar position) to solve this problem themselves (basically upgrading a flash embed to an iframe). It's not clear to me what the exact requirements would be for YouTube to be able to use such a feature. If people here feel that is worth pursuing, I can see if we can find someone at Google to drive this.
Update: @mounirlamouri has pointed out that flash embeds are now completely deprecated by youtube. We've still got per-browser solutions to rewriting. Should we revisit this?
Is there a counter somewhere for
<embed src="https://www.youtube.com/…"
type="application/x-shockwave-flash"
…
Hi @karlcow, use counters for embeds that are eligible for rewriting were added to Firefox a few months ago. See bug 1847295 for more info :slightly_smiling_face:
https://mozilla.github.io/usecounters/#channel=nightly&group=YOUTUBEFLASHEMBED
Can Chromium also implement a use counter?
Can Chromium also implement a use counter?
I am embarrassed to see that I didn't add a UseCounter for this years ago. Absolutely, we can not only add a counter, but also make an informed opinion on our ability to remove this logic from chromium. Bug 1503916 assigned to myself, will plan to land the counter today.
Also note that Chromium's impl isn't just about YouTube anymore, but also dailymotion.com and vimeo.com. I assume we'd want to tackle all three at once, right? I'm adding a UseCounter for all such embed rewrites, but if it turns out to be too large to break, then we could always dig in and look at each independently (though perhaps if we're going to spec one, we should just spec them all?).
OK, great. As far as I can tell, Gecko only rewrites for Youtube (here). I'll check if we've received web compat issues for dailymotion/vimeo embeds.
The counter has landed but probably too late to request a merge into Chrome 120, so it won't hit stable until Chrome 121 late January. Regardless data from early release channels should start showing up here soon. It's wildly unreliable but I have initial data from Chrome Canary on Windows - on the order of 0.001% of page loads, inline with the Firefox data. I'd guess that means it's too common still for us to just remove, but it's not impossible we could do so with more work. In particular, now that I have a rough idea of magnitude and that this is indeed a candidate for potential removal, I can opt this counter into anonymized URL collection - with any luck we could find >99% of the usage is one or two sites suggesting removal is potentially easy.
If we can't immediately remove it would you want to spec it, or maybe since it's been unspecced for so long we just wait another 6 months or something to get a sense for the trend line?
Individual use counters for dailymotion/vimeo/youtube would be nice. My feeling, based on absolutely nothing, is that unshipping vimeo/dailymotion embed rewriting right now is viable. This would bring all the implementations closer together, allowing for easier specification if necessary.
I personally would like to see this hack completely removed from the web, so +1 for waiting another 6 months or so to get a sense for the trend line.
Regardless of the Usecounter values, I'm personally uncomfortable with Google Chrome having a workaround just for YouTube - even if it is, by far, the largest use. I'd prefer to either remove them all or keep them all in Chromium.
Let's see if it's possible to remove all of them, or simplify (maybe treat Flash type
as text/html
and let the affected sites serve HTML or redirect?).
https://chromestatus.com/metrics/feature/timeline/popularity/4740 now shows about 0.002% for December and January. No hits in "Adoption of the feature on top sites".
@RByers
In particular, now that I have a rough idea of magnitude and that this is indeed a candidate for potential removal, I can opt this counter into anonymized URL collection - with any luck we could find >99% of the usage is one or two sites suggesting removal is potentially easy.
Nice, what's the status of this?
I just checked and there's not enough data from Chrome 121 beta to identify any popular origins hitting this regularly enough (need >50 users per day for our privacy policy). Chrome 121 hits stable in early Feb, so hopefully we'll get some hits around mid Feb.
@RByers mid-Feb has passed. :) Is there enough data now for anonymized URL collection?
Yep, thanks for the ping. We do indeed have good data now, and I've done an analysis and shared some examples in the chromium issue. The usage appears to have quite a long tail (top 10 origins account only for 2% of the usage we can measure), and there are lots of examples like blogs and forum posts which are unlikely to get fixed but where the user experience would be significantly diminished by breaking the embeds. As a result, I don't believe we can reasonably remove this behaviour from Chromium, sorry! I also don't see any evidence of downward trend in the few months of UseCounter data we now have. So I don't expect this to just resolve with time over the next few years.
So my recommendation is that we standardize this rewrite. Thoughts?
Thanks!
I agree. @gregorypappas and I also looked at a few in the "top sites" of https://chromestatus.com/metrics/feature/timeline/popularity/4740 , it looks like there are various education and government sites affected also.
WebKit tried to remove rewriting in https://bugs.webkit.org/show_bug.cgi?id=224453 but brought it back in https://bugs.webkit.org/show_bug.cgi?id=237182 cc @achristensen07 @cdumez
Since Gecko and WebKit seem to be fine with only YouTube rewrite, not Vimeo or Dailymotion, that seems like the minimum to standardize to address the web compat problem. Is that a problem for Chromium even if you can point to the spec?
Quick GitHub search to get a sense of how common vimeo/dailymotion embeds are compared to youtube.
Oh nice stats. From my perspective it doesn't make the system really any more complex to have a list of rewrite rules rather than a single one, and there's clearly non-trivial usage of Vimeo too. But if there's no other implementation interested in adding the Vimeo or Dailymotion support, then yeah WHATWG has to just standardize the only thing with multiple-implementor interest.
Chromium will keep the others as non-standard extensions. I just want to go on record saying I don't think it's a great look for the HTML spec to special case a Google property alone, and certainly not something Google would endorse or encourage 😄 . There was a general web compat problem with embedded video when the web evolved to not depend on flash for video, and Chromium is happy to mitigate that compat problem for any video provider with non-trivial usage of their legacy flash embed.
Well now you're being reasonable, @RByers! 😄
@gregorypappas thoughts about supporting vimeo and dailymotion rewrites? @achristensen07 @cdumez same for WebKit?
thoughts about supporting vimeo and dailymotion
It wouldn't be hard, and doing so would mildly increase interoperability, so we might as well just do it. As noted in earlier comments, I still oppose this mechanism in principal, and would hate to see these hacks remain in implementations forever. So even when standardized, I think we should always be eyeing the removal of this stuff if usage decreases enough, even if it takes 20 years :)
Does/will everyone at least log some kind of console error similar to synchronous XMLHttpRequest?
Does/will everyone at least log some kind of console error similar to synchronous XMLHttpRequest?
Sorry I missed this. I'll do you one better, in Chromium I'll add a deprecation report (which has an associated console error) so that sites watching for usage of deprecated functionality in Chrome in the wild will get a hit on this. In practice we've found that this (and console errors) don't tend to actually drive a meaningful reduction in usage, but I still think it's the right thing to be doing. With luck there will be a point some years in the future where we can say the web doesn't need this anymore (we're just long past the point of hoping that'll come soon IMHO).
Gecko seems to have had a console warning for the YouTube embed rewrite since at least 2016.
https://searchfox.org/mozilla-central/source/dom/locales/en-US/chrome/dom/dom.properties#257
More things we could do:
embed
cases non-conforming HTML, and update conformance checkers.embed
to cover what authors should do if they have an old-style embed, which can be linked to in console warnings and conformance checker errors.I first thought this could be testable by intercepting with service workers, but per https://w3c.github.io/ServiceWorker/#implementer-concerns it seems that is not the case. Maybe that can be changed now that plugins are no longer a thing?
It is detectable whether a rewrite happened or not, e.g. window.length
is different. But I assume we want to test more aspects than "did a rewrite happen".
Adding new hostnames as @RByers suggested (youtube.web-platform.test
) is an option, but it seems nicer to be able to test the real URLs.
We shouldn't change how service workers deal with object
and embed
for the purposes of this feature. There's much more to revamp about how object
and embed
work and that's tracked by dedicated issues.
window.length
being different is a longstanding issue about iframe
elements being detectable across the shadow root boundary. I wouldn't want to enshrine that even further by making it impossible to fix due to tests. In fact, I'd kinda lean towards testing the opposite...
Are shadow roots involved here? I thought the rewrite was at the HTML parser level.
WebKit's implementation uses an internal shadow root: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/html/HTMLPlugInElement.cpp#L273-L282 & https://github.com/WebKit/WebKit/blob/main/Source/WebCore/Modules/plugins/YouTubePluginReplacement.cpp#L82-L104
We shouldn't change how service workers deal with
object
andembed
for the purposes of this feature. There's much more to revamp about howobject
andembed
work and that's tracked by dedicated issues.
OK, I can file an issue for service workers so object
and embed
handling can be evaluated on its own. Edit: I commented here https://github.com/w3c/ServiceWorker/issues/249#issuecomment-2363201150
window.length
being different is a longstanding issue aboutiframe
elements being detectable across the shadow root boundary. I wouldn't want to enshrine that even further by making it impossible to fix due to tests. In fact, I'd kinda lean towards testing the opposite...WebKit's implementation uses an internal shadow root: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/html/HTMLPlugInElement.cpp#L273-L282 & https://github.com/WebKit/WebKit/blob/main/Source/WebCore/Modules/plugins/YouTubePluginReplacement.cpp#L82-L104
That WebKit uses an internal shadow root seems like an implementation detail. The object
or embed
element isn't in a shadow root, and per spec those elements can create a browsing context like iframe
. If we want to hide browsing contexts in shadow roots, I think that should be when the author-visible element is in a shadow root.
I can also see an argument to hide YouTube-rewritten elements generally, in that it would more closely emulate the Flash-era behavior (which I assume did not affect window.length
).
Apparently chromium, Gecko and WebKit all have code for rewriting YouTube flash embeds to use HTML instead (so they work when flash is disabled, such as on mobile):
If this is now effectively required for web compat, perhaps we should work to align our implementations and specify it in HTML just to make this a little more predictable / sane? For automated testing purposes, perhaps we should include in the definition an additional hostname like
youtube.web-platform.test
?@mounirlamouri can you share any data on how much of the web benefits from this? I imagine there may be other cases where we want to ask whether this is worth doing. So we may be setting a precedent here. I assume we'd only ever consider something like this when it's has truly massive positive impact on the user experience. I think it's worth having some discussion on what magnitude of usage qualifies (eg. as a fraction of page views, and/or a fraction of HTTP Archive pages).