theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
110 stars 8 forks source link

Show a concept of popularity at the area level as well as route level #2829

Closed brendanheywood closed 7 years ago

brendanheywood commented 7 years ago

What happened?

In hind sight one of those things that seems obvious: image

I feel like all the numbers in the area level table just fade into the background, and all my eyes sees are the charts.

What you expected: We should have this at the area level too. Thinking that it should probably be as simple as total ascents over total routes and then weighted against the sibling crag with the highest.

scd commented 7 years ago

Just to be clear on your suggested algorithm.

Does the most popular sibling crag have a completely full popularity graph?

Do we want to also talk about other edge cases, eg how this will be displayed in a PDF where we have the all descendant areas. In this case relative sibling popularity makes no sense.

brendanheywood commented 7 years ago

For me the main question is: when I'm looking at a crag, and I've only got 1 day and can only visit 1 cliff, which cliff should I go to (or at least investigate in the guide)? ie I want to go and do the most popular 5 route in the most popular cliff as they will probably be side by side, rather than trying to do the top most popular 5 routes across the whole crag which might mean visiting 5 cliffs.

Relative popularity of a route only makes sense right now when comparing siblings (as far as I'm aware?), so I was thinking this would only be visible and meaningful in list view when comparing siblings cliffs. If we can make it work at the crag level then that's a bonus.

I've thought about a couple algorithms and they would all be fairly simple.

a) it's purely based on ticks, and the sibling with most ticks is 100% and then rest are relative to that. This will bias towards larger cliffs. Which is ok because in reality most people will go to the larger cliffs. b) same as above, except we weight it according to # of routes. But then this can bias the other way.

Both are so simple I think we should just pick one and do it and see how it feels.

Either would work, the latter would perhaps more meaningfully scale to uncle / cousin nodes all under the same crag. Maybe some hybrid.

scd commented 7 years ago

Simple is good and I agree with you that the main use case is in standard index view mode. If we are going to experiment with something really simple then let's go with the easiest to explain first, which is a) from your list above. Just using the absolute value for ascents is easier to explaining and thus more likely to get acceptance.

As this is such a simple algorithm I have implemented the data variables as follows:

areaPop: 66,   # new
refAscentCount: 10313, # new
ascentCount: 1733, # already there

Note that I run the list of children through a function which adds the areaPop and refAscentCount params. Because the function can be for any list of areas I have called the indexing variable something generic - refAscentCount. I have only done this for standard list view. In otherwords, these variables will not be available in other templates such as facets.

In repo

brendanheywood commented 7 years ago

Ok the template and styles for all this is working, and overall I really like this.

https://dev.thecrag.com/climbing/australia/ebor-gorge image

https://dev.thecrag.com/climbing/australia/brooyar image

I find this so much faster to grok than the text, ie a good trick is to blur the content and see how well understood it is, and here with araps you can clearly see which things are popular: image

There are some slightly odd things which suggest we should refine the algorithm but none of these are show stoppers right now:

brendanheywood commented 7 years ago

@ulf / @scd this is such a tiny change but I think it has a decent impact is worth a shout out at release time

scd commented 7 years ago

+1 and agreed

As a policy issue can you add anything you want mentioned in release notes to the milestone discussion so that I don't forget it. In this case I have added a sentence about this.

brendanheywood commented 7 years ago

I've just tried out the more nuaced algorithm which weights by routes, and I find this much better. Left side just ascents, right side weighted by routes. It gives a more balanced feel and we end up with less completely empty graphs

image image

brendanheywood commented 7 years ago

Hmm going to re-open and tweak that algorithm, eg here:

https://dev.thecrag.com/climbing/australia/new-south-wales-and-act/northern-tablelands

image

Gara is by far the most popular crag in the new england, well ahead of ebor and yarrowyck etc. But it also has a lot of development going on so is penalized because it has more routes which haven't had much ascents yet (and may never). I need to mull on how to balance these together or whether we just go back to the original algorithm.

brendanheywood commented 7 years ago

Ok after some mulling:

1) if we have a really popular sub-sub cliff surrounded by choss, it still should be the most popular and shouldn't be brought down by it's surroundings. It's parent should correctly be considered the most popular amongst it's sibling nodes. Rephrased another way, whatever underlying metric we come up with should be associative, and therefore as a nice side effect we should be able to directly compare any two nodes across the entire db. Therefore the number of routes cannot be a factor in this metric at all (or if it is then we need 2 rollup stats that we combine just in time like we have elsewhere)

2) the first metric was good at picking the most popular, but it wasn't good at representing the less popular nodes.

So the metric can only be ascent count, but I've applied a log to it. I suspect there is some underlying psychological reason for this, Weber–Fechner or something similar to loudness / decibels etc. Anyway I've experimented with log and it works well, and I've tuned it so it's using a log in base 20. Subtract the log20 of any 2 nodes ascent counts and then x 100

This looks spot on to me:

image

Araps:

image