matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Twitter returns a 404 for t.co links if "bot" is included in the user agent header #13120

Open roughnecks opened 2 years ago

roughnecks commented 2 years ago

Description

Hello, I have some RSS feed in a Matrix room and since a few days already, some Twitter links won't show any preview.

Steps to reproduce

Homeserver

woodpeckersnest.space

Synapse Version

{"server":{"name":"Synapse","version":"1.61.0"}}

Installation Method

Other (please mention below)

Platform

I'm using the matrix-docker-ansible-deploy playbook on Debian 11 VPS

Relevant log output

Jun 27 14:51:26 pandora.woodpeckersnest.space matrix-synapse[1877154]: 2022-06-27 12:51:26,558 - synapse.http.client - 730 - WARNING - GET-5725 - Got 404 when downloading https://t.co/2rf0lRAc1Y
Jun 27 14:51:26 pandora.woodpeckersnest.space matrix-synapse[1877154]: 2022-06-27 12:51:26,890 - synapse.http.client - 730 - WARNING - GET-5726 - Got 404 when downloading http://pic.twitter.com/2rf0lRAc1Y

Anything else that would be useful to know?

No response

DMRobertson commented 2 years ago

some Twitter links won't show any preview.

Is this all Twitter links or just some? If it's just some, can you give us an example of a link that doesn't form a preview?

DMRobertson commented 2 years ago

Oh, from the log sample: https://twitter.com/CM_Memorabili/status/1541402380972879872

The source code is archived at sample.txt

roughnecks commented 2 years ago

Oh, from the log sample: https://twitter.com/CM_Memorabili/status/1541402380972879872

But it's just the short url (t.co) which has problems.. If I paste the long url - twitter.com - it works.

DMRobertson commented 2 years ago
> GET /2rf0lRAc1Y HTTP/2
> Host: t.co
> user-agent: curl/7.82.0
> accept: */*
> 

< HTTP/2 301 
< date: Tue, 28 Jun 2022 10:32:33 GMT
< vary: Origin
< server: tsa_f
< expires: Tue, 28 Jun 2022 10:37:34 GMT
< location: https://twitter.com/CM_Memorabili/status/1541402380972879872/photo/1
< set-cookie: muc=eb3cdf7c-7ddd-4da7-aed6-3087d511a4fd; Max-Age=34214400; Expires=Sat, 29 Jul 2023 10:32:34 GMT; Domain=t.co; Secure; SameSite=None
< cache-control: private,max-age=300
< content-length: 0
< strict-transport-security: max-age=0
< x-response-time: 107
< x-connection-hash: d3bbab17fbe0ac3fd0d071c92608787baf272467a4c4c565a1eb6a7f393ce45c

I wonder if we're not processing the 301 redirect somehow.

anoadragon453 commented 2 years ago

The problem looks to be alleviated if bot is not included in the user agent. If bot is included, Twitter does not return a 302 response with a Location header. It simply 404s.

Attempting this with a local homeserver and setting a breakpoint on this line, I found that the following response headers are returned to Synapse from querying t.co:

Request headers:

{b'User-Agent': ['Synapse (bot; +https://github.com/matrix-org/synapse)'], b'Accept-Language': ['en']}

Response headers:

b'Date': [b'Mon, 04 Jul 2022 17:13:40 GMT']
b'Vary': [b'Origin']
b'Server': [b'tsa_f']
b'Content-Type': [b'text/html;charset=utf-8']
b'Cache-Control': [b'no-cache,no-store,must-revalidate']
b'X-XSS-Protection': [b'0']
b'Content-Security-Policy': [b"default-src 'none'; img-src https://abs.twimg.com; script-src https://abs.twimg.com about:; style-src https://abs.twimg.com 'unsafe-inline'; font-src https://abs.twimg.com https://twitter.com; connect-src 'none'; object-src 'none'; media-src 'none'; frame-src 'none'; report-uri https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=false"]
b'Strict-Transport-Security': [b'max-age=0']
b'X-Response-Time': [b'105']
b'X-Connection-Hash': [b'ff7b9177c8443a7c6cb907cfce4732f6c6d3ec7b191d6e0ec178d60dddbc780f']

changing this line from bot to not allowed the URL preview to work:

https://github.com/matrix-org/synapse/blob/148fe58a247d61ffb76c566ba397285480d93f74/synapse/rest/media/v1/preview_url_resource.py#L411-L413

roughnecks commented 2 years ago

Sorry but this is all too technical for me.. Is there something I can do or just wait for a fix?

DMRobertson commented 2 years ago

Sorry but this is all too technical for me.. Is there something I can do or just wait for a fix?

Wait for a fix. (The comments above will help us to understand how to fix the issue)

clokep commented 2 years ago

Ironically (?) bot was added to user agents to fix previewing of Twitter URLs (see #11985).

I have no idea of a solution here without special-casing t.co. 😢

clokep commented 1 year ago

Changing the user-agent no longer worked for me... 😢