Closed brendanheywood closed 7 years ago
Just to be clear on your suggested algorithm.
Does the most popular sibling crag have a completely full popularity graph?
Do we want to also talk about other edge cases, eg how this will be displayed in a PDF where we have the all descendant areas. In this case relative sibling popularity makes no sense.
For me the main question is: when I'm looking at a crag, and I've only got 1 day and can only visit 1 cliff, which cliff should I go to (or at least investigate in the guide)? ie I want to go and do the most popular 5 route in the most popular cliff as they will probably be side by side, rather than trying to do the top most popular 5 routes across the whole crag which might mean visiting 5 cliffs.
Relative popularity of a route only makes sense right now when comparing siblings (as far as I'm aware?), so I was thinking this would only be visible and meaningful in list view when comparing siblings cliffs. If we can make it work at the crag level then that's a bonus.
I've thought about a couple algorithms and they would all be fairly simple.
a) it's purely based on ticks, and the sibling with most ticks is 100% and then rest are relative to that. This will bias towards larger cliffs. Which is ok because in reality most people will go to the larger cliffs. b) same as above, except we weight it according to # of routes. But then this can bias the other way.
Both are so simple I think we should just pick one and do it and see how it feels.
Either would work, the latter would perhaps more meaningfully scale to uncle / cousin nodes all under the same crag. Maybe some hybrid.
Simple is good and I agree with you that the main use case is in standard index view mode. If we are going to experiment with something really simple then let's go with the easiest to explain first, which is a) from your list above. Just using the absolute value for ascents is easier to explaining and thus more likely to get acceptance.
As this is such a simple algorithm I have implemented the data variables as follows:
areaPop: 66, # new
refAscentCount: 10313, # new
ascentCount: 1733, # already there
Note that I run the list of children through a function which adds the areaPop and refAscentCount params. Because the function can be for any list of areas I have called the indexing variable something generic - refAscentCount. I have only done this for standard list view. In otherwords, these variables will not be available in other templates such as facets.
In repo
Ok the template and styles for all this is working, and overall I really like this.
https://dev.thecrag.com/climbing/australia/ebor-gorge
https://dev.thecrag.com/climbing/australia/brooyar
I find this so much faster to grok than the text, ie a good trick is to blur the content and see how well understood it is, and here with araps you can clearly see which things are popular:
There are some slightly odd things which suggest we should refine the algorithm but none of these are show stoppers right now:
At top countries or regions it may not make as much sense, but I don't want to make a different view for them eg https://dev.thecrag.com/climbing/australia
at regions which contain gyms it may not make sense to directly compare inside to outside eg https://dev.thecrag.com/climbing/australia/queensland/area/400951797
Inside a gym the competitions swamp the other walls https://dev.thecrag.com/climbing/australia/queensland/brisbane/rocksports-indoor-climbing
@ulf / @scd this is such a tiny change but I think it has a decent impact is worth a shout out at release time
+1 and agreed
As a policy issue can you add anything you want mentioned in release notes to the milestone discussion so that I don't forget it. In this case I have added a sentence about this.
I've just tried out the more nuaced algorithm which weights by routes, and I find this much better. Left side just ascents, right side weighted by routes. It gives a more balanced feel and we end up with less completely empty graphs
Hmm going to re-open and tweak that algorithm, eg here:
https://dev.thecrag.com/climbing/australia/new-south-wales-and-act/northern-tablelands
Gara is by far the most popular crag in the new england, well ahead of ebor and yarrowyck etc. But it also has a lot of development going on so is penalized because it has more routes which haven't had much ascents yet (and may never). I need to mull on how to balance these together or whether we just go back to the original algorithm.
Ok after some mulling:
1) if we have a really popular sub-sub cliff surrounded by choss, it still should be the most popular and shouldn't be brought down by it's surroundings. It's parent should correctly be considered the most popular amongst it's sibling nodes. Rephrased another way, whatever underlying metric we come up with should be associative, and therefore as a nice side effect we should be able to directly compare any two nodes across the entire db. Therefore the number of routes cannot be a factor in this metric at all (or if it is then we need 2 rollup stats that we combine just in time like we have elsewhere)
2) the first metric was good at picking the most popular, but it wasn't good at representing the less popular nodes.
So the metric can only be ascent count, but I've applied a log to it. I suspect there is some underlying psychological reason for this, Weber–Fechner or something similar to loudness / decibels etc. Anyway I've experimented with log and it works well, and I've tuned it so it's using a log in base 20. Subtract the log20 of any 2 nodes ascent counts and then x 100
This looks spot on to me:
Araps:
What happened?
In hind sight one of those things that seems obvious:
I feel like all the numbers in the area level table just fade into the background, and all my eyes sees are the charts.
What you expected: We should have this at the area level too. Thinking that it should probably be as simple as total ascents over total routes and then weighted against the sibling crag with the highest.