matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.74k stars 672 forks source link

[DRAFT] Postgres ERROR: invalid byte sequence for encoding "UTF8": 0x00 #2595

Open mibmo opened 2 years ago

mibmo commented 2 years ago

Background information

I used the following image: docker.io/matrixdotorg/dendrite-polylith@sha256:43816ad9050e9cb18b8e5c0cd41f5a779cec295fb122f164b4461f998ec5d344 Dendrite was split over 6 nodes.

Description

What

Databases started spewing errors about

Who

Entire homeserver.

How

Databases emitting invalid byte sequence for encoding "UTF8": 0x00 (and similar) errors, and components writing ERROR: relation "?" does not exist at character ?

When

I'm not 100% sure. Although I think it had already begun happening prior, it started becoming a major issue after I tried to set my avatar to an animated one (webp) and the server crashed (a user of the homeserver wrote to me "the server bricked after you changed your profile picture", and I experienced the same). After the components that crashed came back up again, I checked the logs and gradually the Postgres instances began throwing errors as described above. I'm not sure which exact components crashed, but considering we were seeing the "network lost" messages in our clients, probably the client api? (and more?)

Steps to reproduce

I am unsure how to reproduce this at the moment. I am filing this issue as I was requested to, so it does not currently have reproduction steps. I understand if the issue is closed; this is just for reference :)

I am however suspicious of database locale. With Postgres locale set to en_US, the errors occured, but upon changing to en_GB (same as dendrite.matrix.org) the errors stopped. It is however completely possible that this is unrelated, but I don't have enough familiarity with the internals of Postgres nor Dendrite to say for certain. :)

I first noticed everything with the keyserver database: image image

Link to where I ask in #dendrite:matrix.org for help: https://matrix.to/#/!RcWPWcZrMeBxOGaalX:matrix.org/$mWcIimXJ33UNl2spsYu_-tBqMKjoPXQ4ro3_op3rP7s?via=matrix.org&via=envs.net&via=kanp.ai

(apologies for it being images. I didn't copy the relevant lines to a text document before troubleshooting the issue, and no longer have access to the raw logs)

kegsay commented 2 years ago

Right, this column cannot be TEXT as it doesn't allow null bytes. It needs to be BYTEA. Same applies for the event_json table and anywhere else where we dump user input.