Closed sektorate closed 3 years ago
Some discussion of this in #340 - it's a bit strange to suggest adding categories when they already exist though :)
You're right, of course!
Some experiments on this in the 1.9 branch, starting with a 2d table of point loss vs policy. Could have an option for policy <-> ai order as well, but this is simpler for now. Space for lots more, maybe some summary stats.
@pdeblanc thoughts?
Random dan game:
Random ddk game:
1d layout instead of 2d
idea by marcel
On first impression, 1d layout seems more easily readable than 2d. Some questions to aid iteration:
-Is it possible to create an overall performance score? e.g. if every move lost no points/was the AI top move this gives a 100% accuracy, if every move lost more than 12 points/was the AI worst move gives 0%. This would provide extra insights: "did I win because I played well, or because my opponent played terribly?", "I lost even though my performance was good so this loss isn't so bad".
-Is it possible/would it be useful to combine the points lost/AI rank into one overall metric? An overall score of each move would be more concise.
Overall it would be great to give the user control over the level of granularity, i.e. whether 2d or 1d, which stats are shown; whether the moves are shown graded separately by points lost/AI top move or whether these are combined into one metric etc.
I hope these thoughts help, I love this feature already and am excited to use it.
The Chess.com site really did a nice job with their game report. However, they probably have 100+ developers working for them. Here's a simple version of their accuracy report. It would allow users to customize the category names in the Teaching/Analysis settings. Instead of a separate Game Report, this could also be just another tab, unless you're planning to add additional information in the future.
The accuracy stat would be a weighted average of the categories. Ideally, this accuracy information would update as users moved through the game tree, not just at the end of the game.
Note that all chess apps and sites (that I've seen) use strictly board evaluations for computing mistakes, i.e. top move - actual move. There's no reporting done on how much a move improves the prior position, i.e. actual move - prior move. We've discussed this before.
P.S. I don't know how useful the 2D performance table would be. It seems more like a curiosity rather than helpful information. But, I guess it might show how well your intuition (policy) is working versus your calculation (tree search). The accuracy information seems more helpful.
movecomplexity = sum(policy over candididates) - sum(policy over candididates with point loss <= 0.5) i.e. what policy % is bad moves the ai thought were worth considering. complexity = average movecomplexity weighted_loss = point loss weighted by min(movecomplexity ,0.25) for dark green moves, or 0.25 for mistakes, i.e. trying to downweigh obvious moves accuracy = 100 * 0.75**weighted_loss
formulas aren't great yet. but I like the layout and fields midgame is just move 50-150, which is also not perfect...
^ this looks great! being able to focus on one stage of the game is an excellent idea.
Sander, I like it! Much improved over my version. :-)
I'm working on trying to understand your formulas. What is the cutoff point for the AI candidates, or is this determined by max_visits? Wouldn't higher visits skew the complexity rate upwards (more poor moves searched?)
The accuracy formula seems reasonable. I want to research to see how some of the chess apps do it.
I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.
Good stuff!
it's using all candidate moves returned by the ai, this is of course influenced by visits, root noise, etc. the idea is that the sum of policy priors over low-pointloss candidates represents 'obvious good moves' and the remainder 'nice looking but not good moves' and your move should be considered more important if the latter is big.
I am not convinced this is the best approach, but it's the first thing that kind of did something reasonable.
I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.
c2ca285
I did some quick research on Go and Chess apps, and the only one that seems to calculate a game accuracy is Chess.com. They call theirs "Computer Aggregated Precision Score" or CAPS. It's a proprietary model that incorporates game mistakes and other "pattern of strength" algorithms. In other words, it is a black box.
There's some controversy in the forums about how well it works. Apparently, the statistic can vary widely over games, and it does not give a great predictive power into the rank of a player.
As for complexity, I think your idea has merit. I spent some time studying L&D problems earlier trying to understand why some were more complex than others. The number of reasonable-looking branches in the search tree has mostly to do with it. Whether you can tease this information out of differences between policy priors and search results will be interesting.
This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.
This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.
I generally don't hide things, it's in branch and anyone can test it. Releasing is a lot of work though, and the last time I released for feedback I got zero comments, soooo
testing another weighting in 3adde54 this time based on the policy-weighted point loss
this part can be replace with a chart like this. more intuitive
want to test this a bit more properly. if someone could help collect a nice test set that would be appreciated:
A variety of around 50-100 sgf games from 15k to 7d. 19x19, at least 200-250 moves played. Should have the BR and WR fields set (as in e.g. ogs)
I'll commit to scraping 50 games from OGS spread across 15k to 7d. Does it matter if they're even or handicap?
Shouldn't matter. The idea is to see the numbers by player rank more systematically
Here are 30 OGS games ranging from 9k to 4d.
10 OGS games 1d-4d.zip 10 OGS games 1k-4k.zip 10 OGS games 5k-9k.zip
code as in dde545b (weighted by complexity ~ expected point loss if playing candidates with p=policy) 40b 7.9G @ 500 visits
data: https://pastebin.com/k44TYjY9
accuracy seems ok, a bit weird on 2 outliers. complexity is a bit all over the place, may just remove it.
20 more OGS games
with new games included. r^2 added and added 'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)
Flat weights (i.e. not trying to downweigh obvious moves to reduce effect of opening/endgame)
suggests the weighting does something useful at least!
old idea:
good_move_policy = sum(d["prior"] for d in filtered_cands if d["pointsLost"] < 0.5)
etc
definitely worse
adj_weight = max(0.025, min(1.0, max(weight, points_lost / 5)))
worse?
best result for now, as of 6a71266 will remove complexity as a stat as it's more about early/mid/endgame
data: https://pastebin.com/raw/GsURUGSa
keeping this unless there's any bright ideas
The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.
'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)
Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.
Let me know if you need more games for testing.
The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.
Will probably just kill the complexity stat.
'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)
Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.
The idea is that top 1 is very network dependent, and no limit is very visits dependent, this should be less so (as seen by katago selfplay games ending up at near 100%)
Let me know if you need more games for testing.
I think this is a very nice data set. If you can figure out what's up with the 1-2 outliers that might help though!
In the first outlier game (sunny25 vs lyq), both players missed the killing/saving of a group for many moves (163 - 202). This resulted in 20+ point loss swings for a significant portion of the game.
In the second outlier game (silent1 vs sunny25), both players missed a severe cut for many moves (49 - 126). Then, they missed a double sente endgame sequence for many moves (131 - 187).
Thoughts on comparing the blunder classes to the other player compared to all moves? it's already proving confusing, may just remove the bars there.
Also if someone has/can make a texture that helps make the bars look like bars, that could be nice! (as in transparency mask-only texture)
Here's a mockup I made of what I think would be the most useful to review my games. It shows everything from each player's point of view, so no matter how well or bad White played, if Black played 104 green moves out of his own total of 152 moves, then his green moves percentage bar should be filled about 68%. Now, if instead we want to share stats between players for some reason (meaning 'Black played 80% of the red mistakes played in the game, which means White played only 20% of them'), then that could be an option toggled in the Teaching/Analysis Settings. That way we could get the best of both worlds. What do you think?
Here is a suggestion for a more natural graphic for the Points Lost area:
Even though a pyramid looks more natural from a graphical standpoint, the base may not be always the widest (for a double digit kyu maybe?). And I don't know, I feel we usually see best moves as "top" moves, visually speaking, maybe? In any case, I came up we another mockup that would combine Points lost seen from 1 player only's perspective as well as from both players'. How does it look?
This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either, I personally would be interested mostly in the amount/% of my moves that fall into each category. If a user is that interested in the proportion between each player, they can compare the numbers. Having best moves shown at the top makes sense to me also.
This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either. Having best moves shown at the top makes sense to me also.
^^^ Agree.
I think Sander has the cleanest layout. You don’t want to make it more complicated than this. The X-axis scale for each item can usually be inferred from the label, which is good.
I think showing the bars for all move classes is fine to achieve consistency. Except, I don’t understand the X-axis scale used in Sander’s version (what are the blunder lengths suppose to be?). I would think this should be the % of time you played that move class within the game.
I'm also only interested in the % of moves that fall into each category (separately for each player). I only added the % sharing both players' data as a whole because that was what Sander went for initially, or so it seemed to me. Anyway, you'll see below my last mockup. This one is the closest to what I would go for if it was up to me. Your thoughts?
^ This looks great to me.
Are you arguing mostly about content or format? If content, then I agree that showing the % of moves in each category is best. The format could either be as a pie chart, a stacked chart (like yours), or a bar chart (like Sander's). (Although, I'm not sure what Sander was trying to show in his mockup :-)
As for format, a simple bar chart like Sander's is good enough for me, and it matches the style of the top section. But, I'd be Ok with either.
I like the stacked chart, but it's a bit complicated to make, particularly hiding text dynamically. Here's back to % by category and colourful.
As you see, 1 becomes basically 0 and many games just end up being greeeeeeeeeeeen.
lower level game
I like the latest iteration and certainly would be very happy with it. Although, I don't know why you wouldn't match the same style of bars in both sections:
I prefer Sander's latest iteration; a microsecond glance - "I got mostly green - yippee!" (Coloring the bars makes the data leap out at you.) In the same vein, I would also color the bars in the Key Statistics section; a nice blue would look good.
In my opinion it is simpler and easier to understand if the central "key" of the Points Lost section is presented in this form:
i think ,a better value can have a significant background color. such as , Mean point Loss, the less the better AI Top 5 ,the more the better. at the same while ,avoid using too much color.
In Sander's version I think the middle part should be less prominent and somehow detached from the bars left and right, cause to me it kind of seemed like every color was played a lot no matter what. I tried two mockups. They're not great but they'll show what I mean. The first one shows the most data. The second on shows the same amount of data than in Sander's In any case I don't think the Points Lost middle part should be as wide as the Key statistic one, as in Sander's the colors really seemed to have been played a lot even when say no red or purple mistakes where played
maybe better blue.
@Dontbtme sure that looks better, but keep in mind the whole thing is a single grid layout of labels, and the line is the bottom of the header cell. Give it a try and you'll see how difficult simple things can be in kivy ;)
Still, as is, any bar looks big is what I meant. Can't you limit the colors in the middle around >0.5 etc. without changing the grid? Since colors in the left and right columns are only filling them depending on the %, why colors in the midle column have to fill it entirely? Colors in the middle are what's popping up the most in your picture, when we should be focusing on colors from the player's bars. I would even rather not having any colors in the middle colomn if that's too complicated, that way the mistakes's data colors would appear clearly and brightly on each players's column Anyway, that's only my two cents (although I'm not sure about the dark blue you switched to in the key statistics either, since the all around UI is already some kind of dark blue, but I digress) But anyway, if the above isn't convincing, then maybe that's just a matter of taste, in which case just ignore it and let's move on ^_^
seems wrong value >100%
Hello, many thanks for this amazing trainer! Feature suggestion: I imagine each move being placed in categories (e.g. blunder, mistake, inaccuracy, okay, excellent, best move) based on the percentage change in winrate it effects. The percentage ranges for these categories could be user definable. An overall "accuracy" score out of 100 could then be generated for each player based on the percentage of their moves the engine rates as best. These ideas are inspired by the analysis features of chess.com, that give an overall insight into the players' performance in a game; this would supplement analysis of each individual move. Thanks again for your work, I'd love to hear your thoughts.