Automatic float floating assigned grades / route 'onsightableness' / User ascent grade feedback

scd commented 12 years ago

Very broadly we lots of climbers to be able to have some sort of opinion on the grade of a route, and for the system to combine those in some way to get a consensus grade which could automatically become the 'assigned grade'.

One challenge is mitigating the unintended side effects which may arise and result in constant grade creep.

The key bits extracted from the various discussion below to date:

[x] ideally we want grade feedback when ticking, not editing. Under the hood it might be the same thing
[x] we need a way for people to 'absolutely' grade a route vs 'relatively' grade it, ie 'I thought this was a 'soft 24' vs 'the route is a 24 and I thought it was soft'
[x] this feedback needs be finer grained than whole grades
[x] there must be a default of 'no opinion' so there is not a strong weighting of whatever happens to be the current assigned grade
[x] if you ticked a route with 'no opinion', and later the assigned grade is adjusted, your ascent's grade and cpr should also change. (see https://github.com/theCrag/website/issues/1500)
[x] the grade opinion should be '1 person 1 vote' the same as quality rating. So your ascent stars and grade might change with each ascent (it was hot and it was harder) but only one counts - probably the latest one.
[ ] question: should only grade opinions on successful ticks should count? ie tick, onsight, flash? On the flip side, if I regularly onsite 24s, and I jump on a 20 and it shuts me down and I dog it, and vote for it to be harder that seems valid. But in this case I think you can only say 'its harder' and not 'it was this hard'
[ ] we might need to exclude some edge case ascent grades, ie if you did a link up, or only did certain pitches then it muddies the water so easier to just exclude these
[ ] finally we combine all the various opinions and come up with a consensus grade. lots of open questions as to whether we weight certain votes more or less

https://docs.google.com/document/d/1YIwM3rPALoqGapx_VEUoh7mdk-_dvtby0OkB-eC2ZrA/edit

Other signals to take into account:

a route would have it's own estimated real grade (ie same as cpr)
if a climber of CPR X climbs a route which has much higher CPR Y, then that is a signal it is overgraded and it should have some downward pressure.
The reverse is also true but harder, as people always fall of easier stuff that their peak for lots of reasons temp, off day, etc so we shouldn't pull grades up as hard as we pull them down. Also more people log positive ascents (ie a red point) vs negative ascents (I tried 5 times but it shut me down) so we cannot trust negative data as much (and it's a central tenet that CPR ignores negative signals because of this bias)
Do we want to track some sort of confidence of each climbers grade estimation? eg a climber who has just cracked their first 22, will have basically zero understanding of a grade 27. Likewise someone who regularly onsight 27 is probably going to have a harder time distinguishing a 20 vs 22, vs that first climber who has just worked hard to tick that 22.
Do we also want to track and take into account each climbers 'sandbag' score? ie some people might always have a bias to downgrading or upgrading. Or perhaps based on style.
If we do weight each climbers vote based on their bias, then probably this should not be a weighting but a limit, ie if we said that their vote was only worth 50%, and they thought a 20 was a 22, then they might say it's a 24 so their vote 'counts' as a 22. So perhaps instead we treat it more like a limit, so their 'vote' is more like +0.5 and capped at that regardless of much harder they said it was. But this could get tricky because as the floating grade changes that delta could change.

Some ideas for an algorithm: 1) Each climber gives a vote for the grade they think it is, give some fraction grades (eg easy 23, 23, hard 23) rather than just up or down a grade 2) Each climbers vote has a confidence estimated. Maybe a simple rule is the more routes they have climbed at a similar grade the stronger it gets up to some limit. If you haven't logged any ticks then we shouldn't trust you, karma should factor in somewhere too as an alternative but karma is more about editing and not direct climbing experience at that particular grade. 3) Create a histogram with the adjusted grade votes, we probably want the median to be the 'floating' grade

scd commented 12 years ago

From Nicky:

From my point of view (putting up some routes) it is a pity that you plan to skip/hide the relative difficulty, because grading is never easy and this would be a nice feedback. You could present the data in the same way like the route quality.

brendanheywood commented 12 years ago

I see this more as a UI thing than the underlying model. If a user tells us that a route is "hard" for the grade it doesn't tell us that much. If someone tells us they reckon it's a 23 instead of a 24 that's more useful. The problem was the scale of relative difficult we had was vague and misunderstood, so a 'hard' climb, say a 25, was given a relative difficulty of 'hard' when it probably was about right.

So what I think we need is basically combine the two concepts and have something like this:

You are 'ticking/repoint/...' 'this route' (graded at 25)

It felt like a:

23 (very soft touch)
24 (soft touch)
25
26 (hard for grade)
27 (sand bag)

The normal grade would be the default in the dropdown. I think two grades either side is probably good enough, but we could have more.

We don't have a calculated consensus grade, and we can't unless we get objective rather than relative feedback. I don't even know how much value having a calculated consensus grade is, and whether we would want to use it as the primary grade, similar to the stars after lots of ticks.

If we do want to calculate this properly it we'd need to have a second drop down:

because: (or something to this effect) 1) I went off route 2) The route has changed, rock fall etc 3) I reckon the original grade is sloppy 4) I am an old school trad master, trust me 5) I am 7 foot tall

so we can distinguish between the multitude of different reasons an ascent has a different grade. The only ones we care about for the consensus grade are 2) and 3) and if it is 2) then that should have more weight in the consensus algorithm. This same logic would apply to changing the grade in the route edit process. This might be overkill, and these types of reasons for excluding a grade vote might just in practice be faint noise in the system and not skew it enough to warrant worrying about.

cgome commented 12 years ago

I think something like what Brendan suggests has merit- meets a user need while maintaining/improving data quality/integrity.

Some thoughts

1) I went off route 2) The route has changed, rock fall etc (this should trigger some sort if reminder, "Please consider updating the description to refect the changed nature of the route") 3) I think the current grade is wrong 4) I am an old school trad master, trust me 5) I am very tall 6) I am not very tall

brendanheywood commented 7 years ago

This seems as good as any place to link to this discussion:

https://www.thecrag.com/event/1397846409

The key thing was 'how onsightable is this climb' by climbers who can climb X. Which we can answer by comparing the tick type purity of all the ascents to the point in time cpr for the climber. I'm not sure how expensive it would be to calculate a point tin time cpr for a large number of climbers, some routes have hundred of ascents. But this could easily inform both the consensus grade and the onsight-ability, ie we know that at low grades the difference between redpoint and onsight is roughly N grades but this changes as you go up the grades. We could potentially calculate sorta like an internal redpoint grade and an onsight grade, and the closer they they are the more "onsightable" and the further apart the "trickier the beta".

scd commented 7 years ago

There is already a stat for this called Purity Score. My notes say this stat is {Number Pure Ascents} / {Number Distinct Climber Ascents}, but I would have to look at the code.

I like the idea of using CPR.

If we assign a comparison CPRs for each route then this opens up a whole treasure trove of goodies.

Quite a bit of analysis to be undertaken here, but it is all analysis we need to do to verify and enhance the CPR anyway. Lot's of fun

brendanheywood commented 7 years ago

An idea I have in mind is on the route detail page we have a graph which show cpr on one axis, and then a band for each tick type on the other axis. Something sorta like this:

DaneEvans commented 7 years ago

Personal grades need to be easier to add in some way or another, preferably through the logbook entry so data can also be collected by the people that agree with a grade, rather than just those that disagree strongly.

I like the idea of 'onsightableness' too - if only so I can choose climbs that I'll only need to do once

scd commented 7 years ago

Even though we have rejected issue #323 (relative difficulty), there is not total agreement internally and it relates to people disagreeing with grades.

Nothing is clear in my mind, but I think we are starting to get a picture and just need to spend more time analyzing.

brendanheywood commented 6 years ago

Just moving a comment from @DaneEvans to here from https://github.com/theCrag/website/issues/2525

a few things that I think such a system needs to allow:

Easy changes over the first few repeats relative robustness once a number of ascents have been logged relative robustness for older routes get a vote from people that don't disagree with the grade (will also increase robustness) preferably shame tagging of people that consistently try to upgrade things. I'm thinking the supporter purple surround, but in red or something ...

brendanheywood commented 6 years ago

Also cross linking to Will's idea about upgrade and downgrade opinions being balanced in some way:

https://github.com/theCrag/website/issues/2640

brendanheywood commented 6 years ago

+1 from @birgander2 via https://github.com/theCrag/website/issues/3114#issuecomment-433633551

brendanheywood commented 5 years ago

+1 from support email

brendanheywood commented 5 years ago

+1 from support email

brendanheywood commented 5 years ago

+1 via support for something much simple like median or mode

Drazhar commented 5 years ago

+1 Something like this would be perfect!

One suggestion: I like the idea of simply having the option for "ultra soft, soft, normal, hard, ultra hard" (or something like this) but the default should be "no opinion". The reason is that most people don't care about this and will always take the default which would then "corrupt" the data. Also as already said by Brandan, if I do easier routes, I often can't tell if it's hard or soft for the grade and would prefer to have no opinion. Maybe the weighting with the CPR helps, but this is rather complex and hard to understand if someone doesn't dig into it.

Drazhar commented 5 years ago

+1 from Moxi

brendanheywood commented 5 years ago

+1 from support

lordyavin commented 5 years ago

+1 from @kk56k via #3420

brendanheywood commented 4 years ago

+1 again

rouletout commented 4 years ago

While there might be different reasons for changing a grade I don't really know what we do with historic ascents if grade is being changed.

For example, I changed the grade of this route: https://www.thecrag.com/climbing/mexico/route/2869712769 form 12d to 12c but my stats still show it as a 12d (but CPR is updated).

See here: https://www.thecrag.com/ascents/by/rouletout/with-route-gear-style/sport/in-setting/natural/has/distinctroute/with-ascent-grade/FR:7c:7c?sortby=tick-type,desc

I consider this a bug as the ascent should be downgraded as well.

The only point where grade changes should not affect an ascent grade is if the grade is being changed based on e.g. a hold that broke or a route became so polished that it is harder (typically upgrading) from a certain timepoint on but in general this should be a retrospective correction.

killakalle commented 3 years ago

+1 from me.

Since a long time I've not been able to understand properly what User grade contributions are for and how they influence the grade. After reading now several Github issues I feel I understand the following:

user grade contributions are only an additional reference/citation that is displayed on the route detail page
ascent grades (ticking a route) have nothing to do with a user grade contributions
user grade contributions can only be done when editing a route (weird!)

If a route has no grade - or it is not known - the user grade contributions could be used to calculate and display that route's grade. That's initially what I thought User grade contributions were for. But having an (official) "Assigned grade" and (calculated) "Community grade" makes perfect sense for me. https://www.thecrag.com/es/escalar/test-area/route/11752453

Is there even any need or benefit to having dedicated user grade contributions? Can't the user grade contributions be removed and replaced by a user's ascent grade contribution? To me, this sounds simpler and also sufficient. Or am I missing anything important about user grade contributions?

rouletout commented 2 years ago

Grade feedback with most of the here initially discussed features is implemented with the new ticking interface - new issues need to be created if something is missing - this thread is too wild to be useful - closing

theCrag / website

Automatic float floating assigned grades / route 'onsightableness' / User ascent grade feedback #689