A weighting system to promote polyvalence

ppy / osu-performance

Calculates user performance aggregates from scores

GNU Affero General Public License v3.0

241 stars 45 forks source link

A weighting system to promote polyvalence #28

Open Sakisan opened 6 years ago

Sakisan commented 6 years ago

Many players, including me, have many similar scores in their top 50. As a result a player's total pp is no more than a partial representation of the player's potential. My goal is to make a better representation by nerfing redundant performances, hopefully encouraging players to diversify their skillset and ultimately make them enjoy more facets of the game.

Currently the weight given to a play is purely based on how many better plays have come before. We can improve this by taking the same system but basing the weights on the similarities between the plays.

If a player's 8 best performances are very similar then their contributions should be diminished one by one. But if the 9th and 10th plays are from a different kind then they should count for more, maybe even in full again if they are completely different. Going down the line of all played scores, all plays will eventually be similar to enough scores to stop being significant to the total.

The key new element here is defining when maps are similar. Maps can be different in many ways, so they can also be similar in many ways. There are the basic map metrics: AR, BPM, Combo/Length, OD, etc. There are emergent aspects like aim, speed, accuracy, object density or how technical a map is to play. Though for that last one we don't have an algorithm yet. The frequency of difficulty spikes... I'm sure the list of things can be expanded even more once we start thinking about it. The similarity between two maps is a combination of all these things.

The weights for a given play could be based on the overall similarity between every maps e.g

play #3 is
- 80% similar to play1 : weight 1
- 90% similar to play2 : weight 2

total weight : weight 1 * weight 2

or the weights could be composed independently per aspect.

play #3 compared to play #1: 
- 100% similar in aspect a (same AR for example) : weight 1a
- 95% similar in aspect b (200bpm vs 210bpm) : weight 1b
- not close enough in aspect C : no weight
play #3 compared to play #2: 
- 100% aspect a : weight 2a
- not close enough for aspect b : no weight
- 80% similar in aspect 2c (300 hit objects vs 375 hit objects) : weight 2c

total weight: 1a * 1b * 2a * 2c

Note this doesn't prevent making individual rankings for individual aspects, but it combines the plays into the "final metric" differently than simply adding multiple rankings on top of eachother.

I'd be interested to make a concrete implementation and test it against the data that ppy provided. As this is an idea that I haven't seen before (maybe I've missed it?) I would like to hear other people's thoughts on it first.

Ryukerg commented 6 years ago

I feel like this would actually work better as a concept with separate skill scores than one number. Say right now there's aim, speed, and acc skill. Instead of weighting on total skill, you can weight each skill individually, so your top aim score is 100%, next is 95% so on, and you do that for each skill.

If all your scores have really high aim scores but low acc, then they will all be weighted against each other on the aim scores. If you get one score with really high acc, then it will take the 100% for acc and you will gain a lot more of the pp from that skill. I think this idea only improves as more skills are accounted for.

I like the concept from the perspective from encouraging players to play all types of maps and encouraging mappers to map different maps, but my only worry is that without seeing an implementation and how it compares, I'm concerned that pp may be less of an indicator of skill than it currently is. It rewards varied plays, but not necessarily good varied plays unless it was implemented and people played it for a while (forcing people who play for pp to get good and varied plays to compete). Initially though, there would be very good one tricks who are now much worse than people who play many different styles at an average skill level. Who should be higher is a hard question, but I feel like the best DT player shouldn't be worse than a player who has average scores with every mod/playstyle.

I like it from a concept of varied skills, but I don't know if some of the indicators like AR are an indication of a skill. Low AR could be hard to read and high object density, or it could be a low density and easy read.

Let me know if you disagree on something and we can discuss it.

Sakisan commented 6 years ago

Do you mean: 1) make a weighted total per skill where every score contributes to the total of every skill? 2) make weights per skill and choose whichever skill provides the best weight for that play, then contribute that to a single total?

The first is the same idea as making separate rankings, so I assume you mean the second. The second is an interesting option. I find it a little strange that a score would be weighted on just one skill, but it might work. My feeling is that it would come back down to making separate rankings per skill, except when adding everything together you'd remove the scores that are already featured higher in another skill's total.

I haven't tried making an implementation yet. I expect that it's going to be very hard to find a good implementation that fits everyone. It might even be impossible for all I know, but I'd be surprised. There's a lot that we can try, lots of combinations to test... At least (I hope) it will be clear quickly when a particular implementation doesn't give nice results.

There are way more questions to this concept than this though...

It would change the rankings a lot initially. As you said players will be able to adapt, but do you think the majority of players would like it? How would this system feel for someone who plays daily and tries to improve as much as he can? Or for someone who just likes to play what he likes to play, which may or may not be a broad range of maps?

A score of a play is already the result of the evaluation of aim,speed and accuracy. Is this weighting system maybe overdoing it? Which aspects could be used for this kind of weighting and which could not ?

What are the drawbacks and the other advantages of this system? There must be many. I'll try to make a list of things I can think of...

So if anyone has thoughts on any of this, feel free to share them :) (It doesn't have to be here on github)

wherewedroppinboys commented 6 years ago

My sole concern with this is that it limits players who enjoy a specific niche more than another - say, an HR player who enjoys HR far more than they do DT - solely due to the fact that's how they enjoy the game the most. As Sakisan mentioned above, a player who plays what he enjoys playing - whether it's DT, HR, HD, or any combination of any mods - would see diminishing returns on their performance, even if their plays are groundbreaking in their own respective mods.

If the end goal is to encourage players to play the game as they enjoy it personally, while simultaneously allowing their gradual increase in their own personal skill to allow them to climb leaderboards, I feel as though rendering a talented player in one mod neutered, in comparison to someone who's decent in comparison, at each mod, would cripple that player's ability to perform and compete with others, save for possibly specific-mod rankings - as Ryukerg mentioned, the best DT player shouldn't be ranked lower than a player who's decent at every mod simply because they primarily focus on DT.

Ultimately, I feel as though that, even though many of us have top plays all with a specific mod, that's more of an issue tied into the weighting of specific mods and the mapping system rewarding those mods with pp; many players find their niche and excel within it, and forcing them to branch out to other mods they may not enjoy nearly as much would likely be a detriment to their overall enjoyment of the game.

Weighting each specific talent, such as aim/accuracy/speed, is one thing - that being said, each mod has its own specific aspects to it regarding each of those talents, and I feel as though it'd be a detriment to players who focus on a specific mod because they enjoy it, hindering their ability to progress.

Just my two cents!

lemonmelonlime commented 6 years ago

While this somewhat makes sense from a competitive standpoint, it doesn't make sense from a casual one. From my experience it's actually the casual playerbase that tends to play more of the same song; they're not trying to improve their skill by playing different maps, they just "play the game because they like playing games to music." They tend to pick songs they like and slowly get better at them by passively playing them.

From how I understand this, you're trying to discourage players from playing maps that they like. For instance Azer and FunOrange are quite famous for being technical players. Regardless of whether the performance system is fair or not, if they get less for making similar technical plays that's a turnoff to them (since doing well on those maps are not as rewarding) and their viewers on twitch or youtube who will not only complain more about unfairness "on maps that clearly deserve them" but will also drop in number because those players aren't as noticed.

While I agree with the sentiment that well-roundedness should be rewarded, I feel this method presents too strong a negative incentive against specialization.

Sakisan commented 6 years ago

So the concern is that a specialization in a specific style would reduce a player's pp too much compared to other players. One question I have is how much is too much and how much would be fine? But let me try to explain why it might not be as bad as you think.

If this system is designed with quite a few cross cutting aspects, scores in any styles will have at least some effect on all following scores. When all scores are partially weighted down by all scores before it, the number of relevant scores for the total will probably be low, like it is with the current system. In these circumstances it's still a valid strategy for a player to specialize in a style to improve their total.

If a player can get x more pp in one particular style than in any other, he'll get y scores in that style before the unspecialized score come in. The unspecialized scores will also be weighed down the more y gets bigger, albeit less than the other specialized scores around it. If a player gets equally good plays in all kinds of styles, all styles will be weighed down much more quickly, not leaving any scores down the line to stand out again.

It's a funny thought that casual players would be more specialized than the competitive players. I thought I'd be the other way around. But anyhow I'm sure all kinds of players exist at any level. I don't want to exclude any of them

Tom94 commented 6 years ago

Separating pp into aim, speed, acc, and then adding up the individual totals is one specific way of promoting variation that I personally like. I am also open to other possibilities, but that would need something more concrete than what is described in your proposal. In particular, I would be interested in an approach that promotes variation in aim, speed, and acc without completely separating them.

Note, that it's important to make sure that pp never goes down when adding additional scores and that the order of scores does not matter for whatever algorithm this may be.

Sakisan commented 6 years ago

I made an implementation of such a system and I would like to show an example of how it could work.

In this example I'm comparing the maps by how much aim, speed and acc they are worth. If a map is 45% aim, 20% speed and 35% acc, then it's weighted down more by a map with for example 40% aim, 23%speed and 38%acc than it is by a map with 25% aim, 37% speed and 38% acc. This is an oversimplification of course, but it seems to be working quite well already.
Please remind: The order of the scores doesn't change at all compared to the way it currently is. And scores are only compared to scores that come before them in that order.

Since we like to compare them I've made a page for cookiezi and mathi: http://sakisan.be/osu/polyvalence_1/cookiezi.html http://sakisan.be/osu/polyvalence_1/mathi.html

I could make pages for some other players too if requested.

hostilew commented 6 years ago

In this implementation the number of scores relevant to your total pp count will increase by quite a bit.

Cookiezi gains > 3pp from roughly 100 of his scores in the current system, and gains > 3pp from roughly 150 scores in this proposed system, probably more that aren't listed on your page. I believe the current system is more ideal needing less scores set to be competitive. In order to adjust for this, rather than using a lower weighting to cause a faster drop off in pp (which would further nerf unbalanced players), you could be more lenient on what maps are considered similar, so that it's a bit harder to get the poly valence bonus.

Sakisan commented 6 years ago

I noticed that as well. I added some javascript to adjust the parameters. By default it starts considering similarities when the aspects (aim, speed, acc) are within 10 units of each other. The weights scale linearly with the difference with a maximum of 3% when the values are equal. Both the range and the max penalty are now adjustable and the table updates instantly