matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.83k stars 2.12k forks source link

Wrong text encoding in preview text #2891

Open rkfg opened 6 years ago

rkfg commented 6 years ago

Description

Wrong text encoding is used in the preview text on some sites.

Steps to reproduce

The difference seems to be in the response headers. Where the text is decoded correctly the Content-Type returned by the server is text/html; Charset=windows-1251. Where it's wrong it's just text/html. I think the http-equiv value should have precedence over the header (if it's parsed at all).

Version information

turt2live commented 6 years ago

The client bug is https://github.com/vector-im/riot-web/issues/5473

tleydxdy commented 6 years ago

also happends on android it seems

hawkowl commented 6 years ago

This ought to work as intended next version, although malformed pages may fail the heuristics we've put in to detect when the http-equiv meta tag is being used. For the given examples, though, it should all work fine.

rkfg commented 3 years ago

I tested this exact URL on matrix.org just now and it still shows wrongly encoded text, can we reopen this please?

squahtx commented 3 years ago

Synapse would detect the correct encoding, except the <meta http-equiv="Content-Type" ... declaration for that particular URL lies just outside of Synapse's 1024 byte cutoff.