Closed eras closed 3 years ago
(FTR: This is about link previews)
This is not neccecarily a problem with synapse, synapse is doing it's job perfectly by previewing the url as-is fetched, because matrix.org
's server is located within the EU, Google has a tendency (heh) to present users with the cookie page before letting them access any part of the site, by law.
I agree that it's not particularly a bug in Synapse; however the only parties able to resolve this issue are Google and Synapse (or the 3rd party component it's using), and I have my doubts about Google doing anything about it :).
IIRC e.g. Slack doesn't have this issue, so it's resolvable; even if with special handling.
For one plausible solution consider the following session:
% curl -s -A Mozilla -I https://www.youtube.com/watch?v=RzJf02TIqxk | grep -e '^HTTP' -e '^location'
HTTP/2 302
location: https://consent.youtube.com/m?continue=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DRzJf02TIqxk&gl=FI&m=0&pc=yt&uxe=23983172&hl=fi&src=1
% curl -s -I https://www.youtube.com/watch?v=RzJf02TIqxk | grep -e '^HTTP' -e '^location'
HTTP/2 200
ohh bother. we had this with twitter (https://github.com/matrix-org/synapse/issues/7643).
It looks like we should do the same trick as we did with them (hardcode a mapping to the oembed api):
$ curl -A Mozilla 'https://www.youtube.com/oembed?url=https%3A//www.youtube.com/watch%3Fv%3DRzJf02TIqxk&format=json'
{"title":"PURE RELAXATION - SERVER SOUNDS","author_name":"Hetzner","author_url":"https://www.youtube.com/c/HetznerOnline","type":"video","height":113,"width":200,"version":"1.0","provider_name":"YouTube","provider_url":"https://www.youtube.com/","thumbnail_height":360,"thumbnail_width":480,"thumbnail_url":"https://i.ytimg.com/vi/RzJf02TIqxk/hqdefault.jpg","html":"\u003ciframe width=\u0022200\u0022 height=\u0022113\u0022 src=\u0022https://www.youtube.com/embed/RzJf02TIqxk?feature=oembed\u0022 frameborder=\u00220\u0022 allow=\u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\u0022 allowfullscreen\u003e\u003c/iframe\u003e"}
I guess, this will be affecting an increasing number of (less high-profile) sites as well, such as https://www.golem.de (a german news-portal). Hardcoding exceptions for youtube is certainly warranted - but in the long run, it might be nice to be able to specify custom hooks in synapse's configuration, although I'm not sure if that's really worth the effort.
it might be nice to be able to specify custom hooks in synapse's configuration, although I'm not sure if that's really worth the effort.
This shouldn't be too hard, it would also be nice to default to using the documented providers (https://oembed.com/providers.json).
This shouldn't be too hard, it would also be nice to default to using the documented providers (https://oembed.com/providers.json).
Oooo, thanks for mentioning that, shouldn't that just be preloaded and used directly when URL previews are enabled?
This shouldn't be too hard, it would also be nice to default to using the documented providers (oembed.com/providers.json).
Oooo, thanks for mentioning that, shouldn't that just be preloaded and used directly when URL previews are enabled?
It should probably be tried. I don't know if it will regress other previews. 🤷
also on Hetzner, experiencing the same issue
If anyone wants a temporary user sided fix for themselves, I made this tampermonkey script : https://gist.github.com/ItsCinnabar/ebcfe4f6b3ea7d224a8e1ef0783edeb2
Just edit the match url to your site and load it into tampermonkey/greasemonkey/etc
I found a way how to get it working again, you need to change your user agent to curl
https://github.com/matrix-org/synapse/blob/5a153772c197a689df6c087e49d7bd8beee5dbdd/synapse/http/client.py#L321
replace to something like this: self.user_agent = "curl/7.59.0"
now youtube previews are working again
I found a way how to get it working again, you need to change your user agent to curl https://github.com/matrix-org/synapse/blob/5a153772c197a689df6c087e49d7bd8beee5dbdd/synapse/http/client.py#L321
replace to something like this:
self.user_agent = "curl/7.59.0"
now youtube previews are working again
This works for youtube (which is great, thanks!), but it's not a silver bullet as it depends on how the sites handles different user-agents, so a more versatile approach might still be warranted.
I found a way how to get it working again, you need to change your user agent to curl https://github.com/matrix-org/synapse/blob/5a153772c197a689df6c087e49d7bd8beee5dbdd/synapse/http/client.py#L321
replace to something like this:
self.user_agent = "curl/7.59.0"
now youtube previews are working againThis works for youtube (which is great, thanks!), but it's not a silver bullet as it depends on how the sites handles different user-agents, so a more versatile approach might still be warranted.
yeah, you are right, but for now I think it suits me personally very well and I havnt encountered any url preview problem by now, I guess to make it youtube.com specific you would need to implement some if check for youtube specific and anything else just makes requests through the matrix user agent
I found a way how to get it working again, you need to change your user agent to curl https://github.com/matrix-org/synapse/blob/5a153772c197a689df6c087e49d7bd8beee5dbdd/synapse/http/client.py#L321
replace to something like this:
self.user_agent = "curl/7.59.0"
now youtube previews are working again
this also fixes previews for sites like anilist.co that only displayed a "please use a modern browser" error message before editing this.
Setting the user agent to curl
can be a problem for some other site, I remember it being blocked on some occasion.
Unfortunately, having worked on a framework like embed.ly in the past, it is easy to get to 90%, but the last 10% can be really difficult.
What we ended up doing was having our own user agent on the first try, but if the returned content was blocked, we tried again with google bot and other crawler user agent (facebook, twitter...). But some website can get really smart, I remember some validating the user agent with TCP TTL (IIRC windows is 128 and linux is 64).
I don't know what the best fix would be for synapse. Maybe the user agent could be configurable? Also maybe it could be configurable to use some external API or external command line tool on the home server.
In the end, having nice preview inline is crucial to a good user experience, but it is really hard to get right.
I still think the best fix is to use the oembed api. Changing the useragent is a hack and is always going to be brittle.
well this was labeled as s-minor, it seems the devs dont give a damn since they are not in the eu with their instances and if nobody gives a damn about implementing this oembed api for youtube there are 2 solutions, the user agent hack or hosting the synapse somewhere where this please sign in to youtube preview does not happen.
also I havnt had any trouble with curl as my user agent in synapse, everything works perfectly fine so far
well this was labeled as s-minor, it seems the devs dont give a damn since they are not in the eu with their instances and if nobody gives a damn about implementing this oembed api for youtube there are 2 solutions, the user agent hack or hosting the synapse somewhere where this please sign in to youtube preview does not happen.
also I havnt had any trouble with curl as my user agent in synapse, everything works perfectly fine so far
Well, I don't think this tone is helpful. We are all trying to make things better.
Anyway, I agree that the user agent hack is brittle, per my experience it is not really a solution. But I also know it requires a lot of work to generate good previews. OEmbed is part of the solution and should be supported at some point, but having a configurable user agent can be a quick fix that shouldn't harm anything.
But the work involved to support OEmbed shouldn't be that big, if we look at https://github.com/webrecorder/oembed.link it is not that huge.
But the work involved to support OEmbed shouldn't be that big, if we look at webrecorder/oembed.link it is not that huge.
Maybe it wasn't explicit enough above, but OEmbed is already supported (see #7920). It currently hard-codes Twitter as the only supported service (see https://github.com/matrix-org/synapse/blob/4b965c862dc66c0da5d3240add70e9b5f0aa720b/synapse/rest/media/v1/preview_url_resource.py#L72-L86).
Options to solve this would be:
If someone is interested in working on this I'll gladly help work through any of the above with them, but that is likely a discussion for #synapse-dev:matrix.org.
I think using the list mentioned in https://github.com/matrix-org/synapse/issues/9733#issuecomment-814111058 is the way to go, and maybe make it use configurable (list URL).
So:
seems a good approach
I just wanted to note that adding @tulir's "UrlPreviewBot" UA workaround fixed both twitter image previews as well as youtube previews for me. :tada:.
https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba
I'm taking a quick stab at this, by putting the oembed_globs in config, later possibly defaulting the sample config to derive from https://oembed.com/providers.json
Edit: so unfortunately this is not quite as trivial, Youtube's oEmbed response is an iframe which we can't send over the preview_url
API.
e.g
{
"title": "The Giant Comes to Life...(POWER LOADER: PART 14)",
"author_name": "Hacksmith Industries",
"author_url": "https://www.youtube.com/c/theHacksmith",
"type": "video",
"height": 113,
"width": 200,
"version": "1.0",
"provider_name": "YouTube",
"provider_url": "https://www.youtube.com/",
"thumbnail_height": 360,
"thumbnail_width": 480,
"thumbnail_url": "https://i.ytimg.com/vi/62tPTgpmT1U/hqdefault.jpg",
"html": "\u003ciframe width=\u0022200\u0022 height=\u0022113\u0022 src=\u0022https://www.youtube.com/embed/62tPTgpmT1U?feature=oembed\u0022 frameborder=\u00220\u0022 allow=\u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\u0022 allowfullscreen\u003e\u003c/iframe\u003e"
}
vs Twitter which has no title but sends a blockquote we send over to the client
{
"url": "https:\/\/twitter.com\/CroydonCyclists\/status\/1147416388874768389",
"author_name": "Croydon Cycling Campaign",
"author_url": "https:\/\/twitter.com\/CroydonCyclists",
"html": "\u003Cblockquote class=\"twitter-tweet\"\u003E\u003Cp lang=\"en\" dir=\"ltr\"\u003ETurns out that Lime bike will fine you for parking their bikes in parts of central Croydon where cycling is legal and there are parking racks. Beyond stupid. \u003Ca href=\"https:\/\/t.co\/EtDlbUSfog\"\u003Epic.twitter.com\/EtDlbUSfog\u003C\/a\u003E\u003C\/p\u003E— Croydon Cycling Campaign (@CroydonCyclists) \u003Ca href=\"https:\/\/twitter.com\/CroydonCyclists\/status\/1147416388874768389?ref_src=twsrc%5Etfw\"\u003EJuly 6, 2019\u003C\/a\u003E\u003C\/blockquote\u003E\n\u003Cscript async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"\u003E\u003C\/script\u003E\n",
"width": 550,
"height": null,
"type": "rich",
"cache_age": "3153600000",
"provider_name": "Twitter",
"provider_url": "https:\/\/twitter.com",
"version": "1.0"
}
Edit2:
With some tweaking, I can get some better results out of it, but the code needs a bit of refactoring, all the oEmbed results go through a media/file interface and its not appropriate.
I'm suffering from this issue as well. Youtube previews are of poor quality even when they work, just compare it to how Discord or Slack handles it.
Youtube executives need to have something very nasty done to them for all the dark patterns they started going bonkers on to trick you into giving "consent". Of course this consent is not valid from GDPR perspective, as refusing should be as easy as giving it, and it should under no circumstances limit access.
Discord has some custom behaviour and design for youtube specifically, FYI. it's intended to be invisible, but that kind of special treatment is a bit problematic for element.
Sometimes these popups and other spam can be bypassed by using a fake useragent, like the one the google bot uses, maybe it could work here?
@nukeop please look a few comment above where I linked to a commit which resolves this problem by basically mentioning 'bot' in the useragent for preview requests.
Is there an eta on this being available in a release? Apparently it works for clients connecting to matrix.org, but not other homeservers?
Well its not even merged, so no, no eta whatsoever.
Apparently it works for clients connecting to matrix.org, but not other homeservers?
@nukeop No. It works for servers located outside of Europe. It is broken for servers in the EU or UK like matrix.org.
@aaronraimist so what you mean is effectively 99.9% are affected, and anyone who is self hosting in the US is unaffected? If this is a ploy to get people to self host, it's working.
@aaronraimist so what you mean is effectively 99.9% are affected, and anyone who is self hosting in the US is unaffected? If this is a ploy to get people to self host, it's working.
I'm self hosting and I'm affected. It's more of a ploy to get people to switch to Discord, which doesn't have these problems.
I've removed the conspiracy theories, suggestions of workarounds that have already been discussed 5 times, and "me too!" comments. None of these are helpful; please stay on topic. Yes it's annoying, no it's not a conspiracy by the evil Synapse maintainers to make your life worse.
We know it's possible to work around the problem by changing the User-agent. Per https://github.com/matrix-org/synapse/issues/9733#issuecomment-834348426: I'd rather not do that as I think it will be brittle.
Props to @t3chguy who, rather than complaining about the problem, has started work on a PR to fix it.
Why so defensive?
As a maintainer it is draining to see users spewing such garbage about something you put so much time into.
You can take this opportunity to identify issues that people find important enough to comment on... or you can get defensive and lash out on your users for caring about your software.
I'm going to take further discussion of the oembed implementation to #2752.
@clokep are you aware of any reason we shouldn't include an entry for youtube in that file by default?
@clokep are you aware of any reason we shouldn't include an entry for youtube in that file by default?
oEmbed for YouTube doesn't really give a good response right now, in the image below the first preview is made without using oEmbed (but I'm in the US so I get a "real" description), while the second one is made with oEmbed:
I think the tweaks in #10392 were meant to make this preview better.
oh I see. So really we need to land the remaining tweaks in #10392 before we can make more progress here?
oh I see. So really we need to land the remaining tweaks in #10392 before we can make more progress here?
Yeah, pretty much. I'm not super thrilled with the flow right now of how we do previews when using oEmbed, but that's rather tough to crack apart. It could really use some documentation on where caches are and such.
I Think the gist is that we need to pull more info out of the oEmbed response though, e.g. the provider_name
and title
don't seem to end up properly in the response right now.
Here's what we get from oEmbed:
{
"author_name" : "Rick Astley",
"author_url" : "https://www.youtube.com/c/RickastleyCoUkOfficial",
"height" : 113,
"html" : "<iframe width=\"200\" height=\"113\" src=\"https://www.youtube.com/embed/dQw4w9WgXcQ?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>",
"provider_name" : "YouTube",
"provider_url" : "https://www.youtube.com/",
"thumbnail_height" : 360,
"thumbnail_url" : "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg",
"thumbnail_width" : 480,
"title" : "Rick Astley - Never Gonna Give You Up (Official Music Video)",
"type" : "video",
"version" : "1.0",
"width" : 200
}
What we get from Synapse (when configured to use oEmbed for YouTube):
{
"matrix:image:size" : 18498,
"og:description" : null,
"og:image" : "mxc://localhost:8480/2021-09-01_AfteoaZUTZOUJfoa",
"og:image:height" : 360,
"og:image:type" : "image/jpeg",
"og:image:width" : 480
}
This is really only pulling the thumbnail_url
properly right now.
For reference, this compares to what we get without using oEmbed:
{
"matrix:image:size" : 65665,
"og:description" : "Rick Astley's official music video for “Never Gonna Give You Up” Subscribe to the official Rick Astley YouTube channel: https://RickAstley.lnk.to/YTSubIDFoll...",
"og:image" : "mxc://localhost:8480/2021-09-01_QwaVetzmVlEviNmK",
"og:image:height" : 720,
"og:image:type" : "image/jpeg",
"og:image:width" : 1280,
"og:site_name" : "YouTube",
"og:title" : "Rick Astley - Never Gonna Give You Up (Official Music Video)",
"og:type" : "video.other",
"og:url" : "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"og:video:height" : "720",
"og:video:secure_url" : "https://www.youtube.com/embed/dQw4w9WgXcQ",
"og:video:tag" : "rick astley never gonna give you up lyrics",
"og:video:type" : "text/html",
"og:video:url" : "https://www.youtube.com/embed/dQw4w9WgXcQ",
"og:video:width" : "1280"
}
I put up #10819 which should help with this, but it doesn't give quite as good of a preview as the current HTML parsing.
I've been unable to reproduce the blank / no preview for YouTube from US, UK, or France based servers. Are people still seeing issues with this?
I get URL previews for YouTube now.
I think YouTube rolled out a change where they don't auto-redirect to consent.youtube.com anymore. I remember that some weeks ago the redirect happened on and off for me, which looked to me like an A/B test on their part. Maybe it's fully rolled out yet?
I get URL previews for YouTube now.
I think YouTube rolled out a change where they don't auto-redirect to consent.youtube.com anymore. I remember that some weeks ago the redirect happened on and off for me, which looked to me like an A/B test on their part. Maybe it's fully rolled out yet?
Same here, started working from Germany without updating synapse.
Thank you @evoL and @asmaps! I'm going to close this for now then. If someone is seeing issues still, please shout!
Description
At some point Youtube has updated the site and now all (?) captions generated by Synapse for the site are:
Before you continue to YouTube Sign in a Google company Before you continue to YouTube Google uses cookies and data to: Deliver and maintain services, like tracking outages and protecting against spam, fraud, and abuse Measure audience engagement and site statistics to understand how our services are used
This is basically useless considering the primary point of the function, in particular in the case of a very popular website.
Steps to reproduce
m.room.message
into a room, e.g. https://www.youtube.com/watch?v=RzJf02TIqxkExpected results:
youtube-dl --get-description
:Authentic recordings from inside Hetzner Online's data center park Just like birds and insects, each server sings its own unique song.
Version information