online-go / online-go.com

Source code for the Online-Go.com web interface
https://online-go.com/
GNU Affero General Public License v3.0
1.23k stars 345 forks source link

Sorting of paused clocks stymied by corrupt `last_clock` field #2451

Closed dexonsmith closed 9 months ago

dexonsmith commented 9 months ago

Describe the bug

2443 added sorting of not-running clocks (including paused clocks). It works by computing the remaining time on the AdHocPlayerClock.

Since it's the weekend, I had a look at some other players' sorted games and noticed they were out of order. Using the developer console, I found:

It appears that we can't currently depend on the AdHocPlayerClock to be correct. We need to either:

The main problem with sorting based on last_emitted_clock is that the initial sort doesn't have one available. If we go that route, need to either:

dexonsmith commented 9 months ago

@anoek , curious if you have thoughts on the right way forward:

dexonsmith commented 9 months ago

(Note that the clocks I examined in detail were Fischer. The thinking_time number was too large.)

anoek commented 9 months ago

tl;dr - probably wait until game.goban is loaded. Possibly consider removing the large gamedata field in the endpoint where we load the game list to reduce loading times and likely make it so the time until goban list appears in the correctly sorted order just as fast if not faster than what it is now, while removing the complexity of sorting based off of the json sometimes and the jgof clock others.


Alrighty so here's the mess that is the clock system.

Once upon a time the I wrote six little clock systems over the course of N months. I didn't know what I was doing, didn't know what was staying or leaving, and writing a ton of code fast and willy-nilly like to get stuff up and running. What resulted was the inconsistent and terrible "Ad hoc clock system". Then some awesome folks came along and started writing mobile apps and desktop apps that made use of this horrible system. Years later, in an effort to move to something a little more sane on the front end, I started transitioning to a more consistent representation of things, the "JGOF" (json go format) structure which had at least a little bit of thought put into it, but still is somewhat constrained by the decisions made by the noob. I have hopes to switch the backend over to it too, but it'd require serious migration work by the three binary applications that I don't maintain, so maybe someday we'll walk over to the new system by supporting both on the backend then letting all the clients switch over and then we can finally be done with the AdHoc stuff, but we're not there yet.

Anyways, all that to say, we can't easily change the backend and what it's returning. Also, this is probably why you're seeing the client not updating the ad hoc field, I think we might stash the ad hoc field for some reason(s) still but mostly we just orient ourselves around that "clock" event with the JGOF clock.

Onto what to do. The complication of what you're having to work around stems from the fact that when we list the games we provide a json blob that represents a snapshot of the game, and since it's coming from the server it has the adhoc clock stuff only. After that the client doesn't really do much with this field because we immediately translate it to the JGOF clock system and don't pay any attention to the old AdHoc stuff if we don't need to.

Fix the AdHocPlayerClocks to be correct? Make a JGOFClock available for the initial sort?

There is this method: https://github.com/online-go/goban/blob/main/src/GobanCore.ts#L3670 which might help but I'm not confident in the state of what data you have during different stages of loading, it sounds like you're seeing some funky stuff.

Delay the initial sort until after game.goban is loaded?

This is probably the way to keep the most sanity points intact.

Also, and this may very well be beyond the scope of what you're interested in doing which is totally fine, but it occurs to me as I look through all of this that we are sending this gamedata field with our list of games (loaded as game.json at some point I guess). This is the snapshot of the game from the server. We used to pull this directly from the cassandra database to help with apparent load times so folks could navigate faster - however the backend architecture has changed a lot since that decision was made and it looks like we're now pulling this directly from the server because it's now faster and always up to date. However I think we might also be pretty much just throw this away on the client now and just waiting to connect to the server for the official game connection, as that process is also a lot faster than it used to be and we really don't need to be preloading data for the purposes of display anymore.

If that's true (or close enough that we can make it true), I'm thinking maybe we go with the "Sort after the game is loaded" plan, and also flag that endpoint to not bother sending the gamedata anymore and instead just wait for the game to connect directly via it's websocket connection. The overall effect of this I think should be reduced execution time of the endpoint (as we don't have to first do a GET to the game server for that data), and halving the total data transferred to the client since we won't be sending the initial game state twice (as it pulls a full snapshot when it "connects" to a game anyways).

dexonsmith commented 9 months ago

However I think we might also be pretty much just throw this away on the client now and just waiting to connect to the server for the official game connection, as that process is also a lot faster than it used to be and we really don't need to be preloading data for the purposes of display anymore.

I think this is correct, and the game.json data is pretty much discarded before being ignored. Evidence being that rendering the Clock view (part of GobanLineSummary) depends on having a last_emitted_clock (JGOFClock) and so requires having an official game connection.

which might help but I'm not confident in the state of what data you have during different stages of loading, it sounds like you're seeing some funky stuff.

Okay, so that method updates the JGOFPlayerClocks but not the AdHocPlayerClocks. Probably part of the puzzle (which may not be worth solving directly, as you say). (If we were to solve that puzzle, I think it involves situations where clocks are paused for multiple reasons (e.g., something like start vacation, start weekend, move, stop vacation)...)

then letting all the clients switch over and then we can finally be done with the AdHoc stuff, but we're not there yet.

FWIW, I think clock sorting is the very last use of the AdHoc*Clock stuff in the web client (based on doing a search a couple of weeks ago... search could have been faulty though). So if we clean this up, that's some kind of milestone (even if there are three other clients still out there...).

(Back to the top...)

tl;dr - probably wait until game.goban is loaded.

SGTM.

Possibly consider removing the large gamedata field in the endpoint where we load the game list

SGTM too. First stop using it, then remove it.

IIUC, that means removing the json field from rest_api.players.full.Game, which would trim down the rest_api.FullPlayerDetail.active_games field.

declare namespace rest_api {
    namespace players.full {
        interface Game {
            black: game.Player;
            white: game.Player;
            width: number;
            height: number;
            json: games.GameData;
            id: number;
            name: string;
        }
    }
}

declare namespace rest_api {
    interface FullPlayerDetail {
        user: {...}
        active_games: players.full.Game[];
        ...
    }
}

Any chance the other clients also depend on the Game.json field being there? IOW, should this be a new endpoint, or can we just modify to strip out the json field?

anoek commented 9 months ago

Any chance the other clients also depend on the Game.json field being there?

I don't think anyone else is using the type definition files so I think it's safe to remove that and kinda shake things out on the client to make sure we're not using it in some meaningful way.

should this be a new endpoint, or can we just modify to strip out the json field?

It's unknown to me if the mobile apps or the Conquest of Go game is using the data. I figure we should remove our usage on the client side to see if it is in fact safe for us to remove according to our client, then if so we can poll them to see if they're using it or not. If not then yeah let's just strip it out, otherwise we can maybe add a ?no_gamedata to the request or something that skips that field for those that don't want it.

dexonsmith commented 9 months ago

I thought that delaying the sort was just waiting until render() has run, which I did in f9ff194a92a365a03e3e27736f74c9f0813967ee by sorting in componentDidMount(). On its own, that seemed to work...

... but if I switch over to using game.goban.last_emitted_clock (and never using game.json) -- 6de6a13ca19f396e76aed8d3aa76b58d923c254f -- then the sort gets glitchy. By glitchy, I mean: sometimes there's a short delay before the sort fixes itself; other times it just hangs out in the wrong order indefinitely (or until you click headers to manually trigger a re-sort).

The problem, I think, is that game.goban doesn't actually get filled in during render()... GobanLineSummary fills it in "much" later, calling requestAnimationFrame() inside its own componentDidMount() call.

I tried delaying the initial sort using the same requestAnimationFrame() hack inside GameList.componentDidMount(), but that didn't fix the glitch. I'm not sure why not. According to React docs, the componentDidMount() calls should happen bottom up, so the requestAnimationFrame() calls should probably be enqueued bottom-up as well, and I'd expect the this.goban fields to be initialized by the time GameList's sort call gets run. Maybe this.goban is there, but last_emitted_clock isn't initialized yet...

Ultimately, it seems the initial sort isn't being delayed long enough. I'm not sure what triggers we could/should use to do so though.

dexonsmith commented 9 months ago

Maybe this.goban is there, but last_emitted_clock isn't initialized yet...

I'm pretty sure this is correct. In GobanCore.ts, the only assignment to last_emitted_clock is in this callback registered in GobanCore.constructor():

        this.on("clock", (clock) => {
            if (clock) {
                this.last_emitted_clock = clock;
            }
        });

which I imagine triggers lazily/periodically, but not immediately.

anoek commented 9 months ago

Alrighty, it looks like how it's working is we send a gamedata blob to get loaded and then immediately, but as a separate message, send a clock message to get things synchronized:

https://github.com/online-go/ogs-node/blob/devel/src/termination-server/GameProxy.ts#L783-L788

So that's probably what you're seeing, it'll come in pretty quick but frames could possibly come and go before it happens. Wouldn't want this to be too easy or anything :sweat_smile:

dexonsmith commented 9 months ago

Alrighty, it looks like how it's working is we send a gamedata blob to get loaded and then immediately, but as a separate message, send a clock message to get things synchronized:

https://github.com/online-go/ogs-node/blob/devel/src/termination-server/GameProxy.ts#L783-L788

So that's probably what you're seeing, it'll come in pretty quick but frames could possibly come and go before it happens. Wouldn't want this to be too easy or anything 😅

Yup, that explains it :). I'll have a think about what to do. One way to trigger a sort would be to have a callback that sets a "time remaining" field up enough levels that GameList's state will change and it'll re-render. But it'd have to be stable as the clocks count down (easy for not-running clocks, equivalent to AdHocClock.expiration for running clocks), since you don't want to re-render on every event, and you don't want to do it as clock events roll in slowly... you want to do them more en masse. I'll have a think about it.

anoek commented 9 months ago

Since you have a goban instance you could do goban.on("clock", (clock) => { doSortStuff(clock) } if you wanted. Further, I think React will be smart enough moving around elements so moving them around shouldn't be that heavy, but for 100+ it might be a bit notable. One possibility if it is too heavy would be debounce it a little, something like set a timeout to redraw in 50ms when you get a clock update, any other clock updates that come in before that redraw gets called get bundled up in the one redraw kind of thing. Just a thought.

dexonsmith commented 9 months ago

Thanks; that advice was really helpful.

One possibility if it is too heavy would be debounce it a little

The solution in #2453 does some debouncing implicitly, since setState() updates don't immediately trigger render() and get merged together if they are close to each other. When testing with ~15 games, I printed the value of force_render; it was usually 1 or 2, and the biggest I saw was 4.