jjj333-p / dendrite-admin-interface

[Early Developmentt] A bot interface for administrating a Dendrite server using the administration api and some database interfacing
https://matrix.to/#/#admin-interface-support:pain.agency
GNU Affero General Public License v3.0
6 stars 0 forks source link

Monitoring error log [-> federation and for state desync.] #2

Open anton-molyboha opened 9 months ago

anton-molyboha commented 9 months ago

Well, these are two requests in one, but they have a similar motivation: in both cases the symptom is that you don't see some of the messages in a room/rooms. Unless the affected counterparty is matrix.org, it is easy to not even notice the problem. The bot could monitor dendrite logs, and warn the admin if something looks off.

In my experience, state desync manifests itself with a large number of "Error authing soft-failed event" messages in the dendrite log. The bot could monitor the log, and warn the admin if it sees such error repeatedly for the same room_id.

I'm not sure how to catch broken federation. Outgoing federation problems will probably show up in the log as well. Incoming federation -- in the worst case -- can be caught statistically: if we are used to see X messages per day from a remote HS, and suddenly that number drops to zero, we probably have a problem.

jjj333-p commented 9 months ago

changed the name of this because i think its actually a really cool idea to try to catch and report on some of the errors but i dont think it should be just federation errors (albeit those are the most common)

i honestly wonder if it could (should) even automatically try to run a download state when it spots a state reset (something that actually shows up in logs before those auth errors if you search for it but its easy to miss), but i think definitely whenever it sees something like that it should report on it

as for your last bit, honestly incoming federation is easier to catch because its usually accompanied with something similar to dendrite issue #3202 with a "bad signature" error and usually some back-filling. I'm not entirely sure how to go about detecting outgoing federation issues as i havent experienced them myself, the thing ive more experienced is it not receiving events from any clients until things are rebooted and that doesnt produce errors, it just has the m.typing edu events and never logs about an event sent to roomserver. of course my idea is for unexpected errors it just posts that it saw the error so that would i suppose take care of anything infrequent that does produce an error.

anton-molyboha commented 9 months ago

I overall agree.

Regarding "download state", I would not run it automatically, but instead print a message telling the user that it might help and how to run it. In my experience, it was not as reliable at fixing state issues as I hoped. But also: "download state" requires the user to choose which homeserver to download state from, and we don't know what they would choose. After people have used the bot for a while and give feedback, then it will make sense to change the behavior to automatically download state, if we see that that's what people do every time anyway.

incoming federation is easier to catch because its usually accompanied with something similar to https://github.com/matrix-org/dendrite/issues/3202 with a "bad signature" error and usually some back-filling

sounds like a plan!

Thank you for suggesting this admin-bot idea, I think it is great!

jjj333-p commented 9 months ago

Sounds good! 👍

On Mon, Dec 11, 2023 at 4:57 AM Anton Molyboha @.***> wrote:

I overall agree.

Regarding "download state", I would not run it automatically, but instead print a message telling the user that it might help and how to run it. In my experience, it was not as reliable at fixing state issues as I hoped. But also: "download state" requires the user to choose which homeserver to download state from, and we don't know what they would choose. After people have used the bot for a while and give feedback, then it will make sense to change the behavior to automatically download state, if we see that that's what people do every time anyway.

incoming federation is easier to catch because its usually accompanied with something similar to matrix-org/dendrite#3202 https://github.com/matrix-org/dendrite/issues/3202 with a "bad signature" error and usually some back-filling

sounds like a plan!

Thank you for suggesting this admin-bot idea, I think it is great!

— Reply to this email directly, view it on GitHub https://github.com/jjj333-p/dendrite-admin-interface/issues/2#issuecomment-1850245553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWNJYMDZWIDAX2VC7QUTEALYI4NNHAVCNFSM6AAAAABAO5IR5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQGI2DKNJVGM . You are receiving this because you commented.Message ID: @.***>