Open michaeljb opened 5 months ago
Firefox profiling shows the slowness coming from computing the graph during rendering, starting from the connected_hexes
call in tracker_available_hex
:
https://github.com/tobymao/18xx/blob/3794ce0dd/lib/engine/step/tracker.rb#L494
From there it's a pretty deep stack of walk
ing the graph
Adding a bunch of temporary puts
statements to debug this, including some I've added dozens of times before, inspired me to finally make them permanent and set up a global logger for the game engine and views - #10350
If that PR was already merged and this problem was encountered, identifying it as a graph problem would be as simple as adding ?l=0
to the game URL and watching the console.
The problem is something to do with the deletion on the converging check from #7788, removing these lines lets my browser compute the graph (probably not correctly) in about 0.02 seconds instead of 55 seconds:
Still have some failing tests, but I think I've got some progress on my graph-converging
branch - https://github.com/michaeljb/18xx.games/commits/graph-converging?since=2024-02-19
I have some real progress on my bfs-graph
branch. https://github.com/michaeljb/18xx.games/commits/bfs-graph/
There's still a lot of work to make it to support the custom args on Engine::Graph
and make it a drop-in replacement.
The key part that will fix the algorithm for these bad cases isn't switching from DFS to BFS, but instead of the converging_path
logic, it tracks from which direction paths and nodes are added to the graph, i.e., once a path has been reached from both sides it can be skipped.
My graph also advances one function call at a time, instead of recursively traversing the whole graph. This enables visualizing the graph as it grows, and would be required for theoretical future features like persistent lazy graphs that don't get cleared, or bidirectional search for more efficient home-destination connection checking.
In game 151565 the tile lays at action 536 and 537 on F19 and G20 raise the number of total node.walk
and path.walk
calls needed to compute LNWR's graph from 445 to 2711.
While not adding any actual new connectivity for LNWR, these tile lays add a lot more converging exits, and they're first really big jump in the number of walk
calls made.
As the game progresses the number of calls gets huge.
Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk
calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.
I was hoping that finding the first big unnecessary jump in walk
calls would make it easier to describe what exactly the problem in the current algorithm is, but the track network at that point in the game is already fairly complex. I can't be more specific than saying the converging_paths
logic is buggy and leads to lengthy loops.
When a check is made to see if a corporation can run routes, route_info
calls compute(routes_only: true)
. This still does a full DFS search, but only starting from one token. It could probably be optimized to return as soon as multiple nodes are found.
Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.
It looks like these numbers are wrong, but the right numbers are still in the 100Ks if not millions. Looks like I was messing up the default argument when plumbing walk_calls
through.
Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.
It looks like these numbers are wrong, but the right numbers are still in the 100Ks if not millions. Looks like I was messing up the default argument when plumbing
walk_calls
through.
With #10951 I am actually getting the same endgame numbers at action 759 (LNWR's last turn of the game, in OR 8.1). Not sure why some of the earlier counts are different than what I was seeing before 🤷
Some good progress this weekend, I was able to configure 18GB to use my new adapter class instead of Engine::Graph
here's screenshot of the side-by-side logs for loading up the problematic game to action 759:
I can't step through the game yet to see how much that performance has improved, getting some errors I haven't had time to fix yet, but it's certainly coming along nicely!
The real source of trouble appears to be in eliminating overlapping paths from connected_paths
. If a path is overlapping (e.g., the pink path in the screenshot here), to see if it's legal, it needs to search for every possible route back to a token that avoids the path(s) it overlaps with, so if there is no such route it could be a very long process to confirm that.
Discussed in the site chat this morning:
I repro'd the slowness but haven't had time to profile or do a cursory investigation of whether the slowness is the game processing or the rendering. (also need to see whether the issue is specific to 18GB)
Copied the game data to this gist
Maybe similar? #10123