pedroteosousa / HollowKnight.BingoSync

A Hollow Knight mod that automatically marks squares on bingosync.com
0 stars 4 forks source link

Please avoid sending erroneous requests when the bingosync board is in lockout mode #8

Open kbuzsaki opened 2 months ago

kbuzsaki commented 2 months ago

Hi there! I'm the maintainer of https://bingosync.com and https://github.com/kbuzsaki/bingosync

I've recently been adding various monitoring to the site so that I can track the site's health, volume of traffic, and investigate any errors. After doing this I've found that there are semi-regular spikes in invalid requests for the PUT /api/select endpoint from users who seem to be connecting to Hollow Knight rooms via a third party client. These bad requests are attempts to select goals that have already been marked by another color when the board is in lockout mode. There's no user agent, but after a bit of googling I'm guessing that origin might be this mod, or maybe another mod that you can help point me to :)

The bad requests don't have a functional impact on the board state (Bingosync recognizes that it's an error, denies the request, and returns HTTP 400 with the response b'Blocked by Lockout'), but they do put unnecessary load on the server and pollute my logs and metrics. They also seem to be retrying quite rapidly, sometimes multiple times per second from the same IP (though for different goals), to the point that they can be >6x the traffic volume of the good traffic.

I'm filing this issue to ask for you to update the mod to not send these PUT /api/select requests for goals that have already been marked when the board is in lockout state. It's okay if there's an occasional race that causes them to be sent after a goal is first marked or when a new card is generated, but the rapid retrying needs to stop -- if it becomes more frequent it might start having a meaningful impact on server costs.

Bingosync's own frontend javascript has a simple check for this here: https://github.com/kbuzsaki/bingosync/blob/main/bingosync-app/static/bingosync/room/board.js#L207-L213 ROOM_SETTINGS is updated here from the GET /room/<uuid>/room-settings endpoint whenever a new card event is detected: https://github.com/kbuzsaki/bingosync/blob/main/bingosync-app/static/bingosync/room/room_settings_panel.js#L30

Please let me know if you need any more information :)

pedroteosousa commented 2 months ago

Hi @kbuzsaki, sorry I took this long to respond. The culprit is definetly this mod. I'll take a look into it now and fix it as soon as possible

pedroteosousa commented 2 months ago

@kbuzsaki, I implemented a fix. The mod should be updated soon and I think people generally keep their mods updated.

I added a user agent to the requests ("HollowKnight.BingoSync/{version}") so you can keep track if the mod is doing something weird. Let me know if there is anything more I can do and sorry for the inconvenience.

kbuzsaki commented 2 months ago

Thanks for the quick fix! I can confirm that the volume of 400-ing PUT /api/select requests has dramatically decreased in the past week :)

Thanks for adding the user agent as well -- spot checking the traffic with that user agent things look mostly good, except for requests to PUT /api/revealed that are 404-ing because they have an empty room id. It's meaningful amount of traffic as well so I'd appreciate a fix there when you can get to it, but it's much lower volume than the PUT /api/select traffic was so it's not as urgent.

Thanks again!

TheRealAlph4 commented 4 weeks ago

Hey @kbuzsaki, I'm currently working on finishing up a major update to this mod, and I took the liberty to (seemingly successfully try to) fix this issue with requests with empty room id. The update is hopefully going to go live sometime in the next week or so, if everything goes according to plan. I can ping you here so you can check your logs if you would like.

I do want to ask you, how you want clients to handle exceptions in UpdateBoard requests, as those requests sometimes take several seconds (in my experience even up to ~14 seconds) to get a successful reply to. Is there an expected/recommended way to delay retries?

The current version of this mod takes an exponential backoff approach, 6 tries across a total of roughly 6 seconds, which tends to time out completely without a success response way too often. In my update I modify this approach, capping the retry delay to 2 seconds, but allowing significantly more retries. Would e.g. a flat delay be better?

Thanks in advance!

kbuzsaki commented 4 weeks ago

What HTTP endpoint does UpdateBoard correspond to?

Is it /api/select to update the color of a particular square, /room/$id/board to get the current board state, or something else?

I haven't looked much into endpoint latencies yet beyond a general health check of the home page, so please let me know any details you're seeing of strange behavior :)

In particular, if there's a specific timestamp and room id / player that you can mention, that would help me pinpoint in the logs. I'm guessing there might be some database contention that I'm not aware of.

TheRealAlph4 commented 4 weeks ago

What HTTP endpoint does UpdateBoard correspond to? Is it /api/select to update the color of a particular square, /room/$id/board to get the current board state, or something else?

The specific endpoint would be /room/$id/board to get the current board state.

I haven't looked much into endpoint latencies yet beyond a general health check of the home page, so please let me know any details you're seeing of strange behavior :)

It's pretty common upon changing the card (meaning either generating a new card, or setting a new custom card with a custom json). You should be able to reproduce it really easily as it's quite common.

  1. Join any room on https://bingosync.com
  2. Generate a new card
  3. If it worked as expected, go back to 2.
  4. At some point, the displayed board won't change at all, when you reload the page, it will either be fine and showing the new board, still showing the old board, or showing a completely blank board. After a few page reloads, it's generally going to behave as expected, showing the new board.

In particular, if there's a specific timestamp and room id / player that you can mention, that would help me pinpoint in the logs. I'm guessing there might be some database contention that I'm not aware of.

I just reproduced it, in case that makes it more convenient for you:

kbuzsaki commented 4 weeks ago

I see, that's probably due to a rate limit that I put in place due to performance problems I was having with people scraping /room/$id/board too often. I believe I've optimized that enough that it should be safe to relax the rate limit.

Previously the limit was 5 requests per minute, I've now bumped it up to 60 requests per minute. If it is the rate limiter you should be seeing the HTTP status code 429. With the relaxed rate limit I expect that waiting for 2 seconds should be sufficient.

Please let me know if you're still seeing unexpected errors here.

TheRealAlph4 commented 4 weeks ago

I can't even check the HTTP status code, because I can't replicate the issue anymore :D Seems like that solved it, thank you very much!

TheRealAlph4 commented 1 week ago

@kbuzsaki The update is live, you should see the number of bad requests from these clients drop to basically zero as people update their mods. If you see any regular bad requests from the new clients (user agent HollowKnight.BingoSync/1.3.x.x), please let me know!