superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.68k stars 311 forks source link

[feature] Include charset in Content-Type header #2598

Closed Craftplacer closed 7 months ago

Craftplacer commented 7 months ago

Describe the bug with a clear and concise description of what the bug is.

GoToSocial's Content-Type header does not include the charset, causing clients like Kaiteki (mine hehe), to fallback to encodings other than UTF-8 causing garbled text (if not ASCII or whatever) on the client.

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type#syntax

What's your GoToSocial Version?

0.13.1 git-ccecf5a

GoToSocial Arch

arm64 bin

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Anything else we need to know?

No response

Craftplacer commented 7 months ago

Screenshot of headers:

image

tsmethurst commented 7 months ago

Thanks for the report. This isn't really a bug in GoToSocial since charset is not a required directive, and utf-8 is the default encoding type for application/json according to RFC 7159. It might be a good idea for you to update your client to assume utf-8 when parsing JSON.

daenney commented 7 months ago

RFC 8259 obsoletes 7159 and states in section 8.1 Character encoding

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

In the context of ActivityPub, everyone's always used UTF-8. I'm not aware of a single implementation that's done otherwise, so assuming otherwise in your client or JSON parser is only going to cause problems.

tsmethurst commented 7 months ago

Ah thanks @daenney , I always struggle trying to find up-to-date RFCs

tsmethurst commented 7 months ago

Ah here, it's a known issue in dart: https://github.com/dart-lang/http/issues/175

tsmethurst commented 7 months ago

I'm gonna close this then since it's not really a GoToSocial issue; @Craftplacer there's code linked in dart-lang issue above that should let you update your parser to do utf-8 by default for json, which is compliant with the RFC linked by daenney

Craftplacer commented 7 months ago

Thanks for the heads-up. Then it's truly just jank Google stuff again :woozy_face:

daenney commented 6 months ago

I was just reading the ActivityStreams spec, and realised that in the Serialization section it explicitly says:

Activity Streams 2.0 documents MUST be serialized using the UTF-8 character encoding.

So regardless of what JSON RFCs do-or-don't specify, in our context it is always is UTF-8.

tsmethurst commented 6 months ago

Well, not exactly. In this context we're talking specifically about the client API of GtS, not the client-to-server or server-to-server AP api.

daenney commented 6 months ago

Ah, right, yes. Ah well. Anyway. UTF-8 :rocket:.