Refinements to maps api and node selection algorithm

brendanheywood commented 11 years ago

There is a growing number of common map use cases where the existing algorithm isn't working great. So collecting some test cases here to help inform improvements. I've got a new algorithm in mind which is only slightly different the current one which should fix all of these (after responsive, after topos, after....?):

[ ] Siblings or uncles that are structurally valid https://www.thecrag.com/maps/climbing/australia/gibraltar-peak-and-corin-road-crags#-35.480948,148.969688,13,,terrain
[x] cousins on either side of a border, but only one set is shown
[ ] lots of children but cluster too close so labels clash https://www.thecrag.com/maps/climbing/australia/nowra#-34.847681,150.537987,11,,hybrid

brendanheywood commented 11 years ago

Principles

[x] the nodes returned should be completely decoupled from the node hierarchy, ie if you pan across a border you should never see nodes appear or disappear before the pan out of the map view port
[x] at any zoom level, if a node is over a certain size then you should see it's contents
[x] any shape that crosses the view port, and is above a certain visible area threshold will be visible
[ ] the 'current' or 'here' node should work for very thin cliffs near the center of the view port
[ ] panning across a clean boundary between two mutually exclusive shapes (eg region boundaries) should transition directly from one to the other, and nothing else (eg not the shared ancestor)

new algorithm

Given:

the map viewport bbox
a second 'focus' size
a visible width threshold, in meters

1) Start at worlds children nodes 2) Find any node who's bbox overlaps the viewport, ignore the rest 3) If a nodes shape is over a visible size, then split it apart and recurse into it's children

The definition of 'visible' is if any dimension of the shape is wider than X pixels, where X is determined purely on the zoom factor. So a very thin long cliff with 10 children cliffs may never have enough area to ever be considered 'here' with the current algorithm, but will get picked up by this new one. Likewise a curved banana cliff is very difficult to zoom into currently for the same reason. Ultimately this will return a tree structure of the whole world, but filtered purely by geography and not hindered by the node structure.

So if I am say near my house and I'm zoomed out so both Blackman's bay and bruny island are on the map I should get back from the api:

{
    "visible": [
        {
            "11737699": [
                "Australia",
                [
                    {
                        "11738203": [
                            "Tasmania",
                            [
                                {
                                    "11744707": [
                                        "Blackmans Bay"
                                    ],
                                    "188180889": [
                                        "Brunyisland"
                                    ]
                                }
                            ]
                        ]
                    }
                ]
            ]
        }
    ],
    "here": "11738203"
}

'Here' / 'where am I' logic

The 'here' logic is very simple. It needs to:

[x] work with overlapping / bad location data
[x] work when a child entirely fills its parent (and detect this as the child)
[x] be reversible: an algortihm, given a node to zoom to that location, which should then return the same node

The algorithm:

draw 5 imaginary points, one in the center of the viewport, and one in each compass direction a third of the way from the centre to the edge of the view port (1/3 is a safe value which guarantees any zoomed and centered bbox rectangle will always contain 3 of the 5 points)
find the lowest node which is visible, under each point, and count each node, note by 'under' I only mean inside the bbox, not the actual boundary - this makes a big difference to what gets matches and fixes some bug with odds shapes like banana shapes.
If by looking at the bbox'es you end up 'in' two or nodes, then and only then check the boundaries to see which one you are closest to. Note closest to, not simply just in, but if you are inside the boundary then the distance will be 0 and so likely win. Which ever one is selected keep on recursing down as before. If you happen to still have a tie, recurse first, and then if you still have a tie (ie there was no child smaller nodes which resolved it), then pick the node with the closest centroid.
the node with 3 points wins
If no node wins, then look at the depth of all nodes, and truncate the longest / deepest nodes, and repeat until you have a single node with at least 3 points

eg given 5 points with these hierarchies:

 a > b > c > d > f
 a > b > c > d > f > g
 a > b > h > i > j > k > l
 a > b > h > i > j > k > m
 a > b > h > i > j > o

In this case 'j' wins

Why this works: Every bbox is a rectangle, (vs the boundary which is not gauranteed to have any spot 'inside') and regardless or it's size, when you focus on the box in gmaps, it will snap to an integer zoom level, and pan to center the box. In all situations any one node must have 3 points active, either west to east, or north to south, in the box.

When it fails: When a child area bbox fills the parent bbox ( or other nodes). There probably isn't a robust reversible algorithm for this case, I'm happy to not bother with this edge case.

Performance

By primarily using bbox we get the added bonus as saving a heap of computation when comparing shapes, and we will need to compare a lot of shape x5 points, so it all ads up. If any bbox side test fails we can exit early saving more comparisons and lookups. Additionally we know we need a quorum of 3 nodes out of the 5, so instead of doing a full scan for each of the 5 points we can process them together until they diverge and then stop saving more cycles.

Display logic

[ ] All leaf nodes, like blackmans and bruny will get shown with a label.
[ ] Any other nodes further up will get their shapes shown but not their label.
[ ] crumb trail is updated with 'here' node

Extra api data

Currently we call the existing end point, get back some id's, and then make a second call to selectively get the data we need. This was to eliminate lots of data over the network if the map was only panned a small distance. But on mobile it's mostly the latency that sucks, so I'd like to eliminate the second call and embed just the name of the node. The rest I'll get on hover or tap as needed.

other questions to answer

The big question is, given how memory fragile the maps is, should we not cache anything in memory, and if so, should we add more data to this call and completely remove the existing clustering. If we do this well need to add the 'remaining' route count to each node, and also include all descendants who aren't big enough to have a label, but are big enough to show a cluster of routes. I'm kinda leaning towards this but need to thrash and mull a bit more. Essentially this moves all clustering logic to the server.

brendanheywood commented 10 years ago

I've just expanded a bit more detail on the '5 points' algorithm above, I assume that if you do this part you'll only do it, but not all the other maps api stuff, just enough to close #1403 / #916

This new implementation replaces the login in /api/map/bbox but the interface should be exactly the same and the js maps shouldn't need to change at all.

brendanheywood commented 10 years ago

Also if we do move the majority of the clustering logic to the server, then this also subsumes #421

scd commented 6 years ago

@brendanheywood with this algorithm you are recursing if the dimensions are wider than X pixels. Do you also mean higher they Y pixels as well. Otherwise a vertically aligned thin area might not trigger but a horizontally alligned one will.

scd commented 6 years ago

I have just done a data migration to split bounded box from a single text field into separate number fields and I also added the long and lat angles for index optimisation.

I think the core query for this something like this:

SET @long1 = -180;
SET @lat1 = -90;
SET @long2 = 180;
SET @lat2 = 90;
SET @angle = 7;

SELECT SQL_NO_CACHE
  L.ID,
  L.NetworkNode AS NodeID

FROM
  NodeLocation AS L

WHERE
      L.ActiveFlag = 1
  AND L.BBoxPt2Long >= @long1 AND L.BBoxPt1Long <= @long2
  AND L.BBoxPt2Lat >= @lat1 AND L.BBoxPt1Lat <= @lat2
  AND L.LongAngle >= @angle

Which returns 227 location records, which sounds about right. Note that the LongAngle field is for index optimisation. Using a maths forula (long2 - long1) the query takes about 100ms, but using the indexed pre-calculated field this goes down to about 15ms.

Hopefully adding all the other info does not slow the query down to much.

brendanheywood commented 6 years ago

Yes the max width or max height, using either min or max depending on context. So assuming LongAngle is populated with the max of the width and height then all good

brendanheywood commented 6 years ago

Also need to test the international date line wrap around, we've had issues in the past with that

scd commented 6 years ago

Hmmm, international date line. The bounded box supplied to the api end point could be over the date line or the bounded box of nodes could be over the date line. I think the first is dealt with by detection and modifying the search query. The later could be done by detection on nodes at query time, but then this would lose a lot of efficiency. I am pretty sure we will need a field which says the bounded box goes over the datelline.

scd commented 6 years ago

Also there are a few other edge cases in the query, which I think I have mostly worked through and kept it relatively efficient

[x] exclude test node
[x] exclude hidden locations (excluded logic for user permissions as this is not really used yet and complex to implement, so if there is any level of hiding it is hidden)
[x] exclude merged and deleted nodes (implemented the other way around, include specific node types)
[x] if location is missing from parent node it should still be returned in the heirachy query

scd commented 6 years ago

I am going through the here algorithm logic, which I cannot fully follow. If the viewport is the whole world then the 5 points are [[0,0],[-60,-30],[-60,30],[60,-30],[60,30]] which leaves the here value pointing to Africa

                  "hereCount" : 3,
                  "bbox" : [
                     -27.6581187,
                     -41.1761476,
                     63.9678578,
                     40.3488
                  ],
                  "name" : "Africa",
                  "id" : "11737843",

@brendanheywood would this be what you expect?

brendanheywood commented 6 years ago

Yes that actually does make perfect sense following the algorithm. The choice of 1/3 the viewport in each direction is easier to explain with a diagram, but that value of 1/3 could be enlarged to as high as 1/2 and still be correct for the roundtrip node selection, so try tuning that first. Ideally I'd like to see that visually on the map and pan around and see how it actually feels.

scd commented 6 years ago

Tuning we can do later when we have the benefit of something visual, so I will leave it as it is. The main thing is that I wanted to confirm that I did not misunderstand.

I have gone with a pretty simple implementation which is to find the deepest nodes which pass the three point tests. If there are more than one then just use the closest centroid distance. If we need to enhance this then it will be clearer when we have that visual display.

Should the here variable be an id or a node with url etc.

brendanheywood commented 6 years ago

Just an id is fine. In the simplest case where the client does no caching, it will never pass in a set of nodeid's to start the search from, which should mean that the data for the here node is guaranteed to already be in the payload.

brendanheywood commented 6 years ago

I think it's worth tuning that 1/3 from .333 to .4, it's a safe change and will mean at the world view the world node will be 'here'

brendanheywood commented 6 years ago

Finally done :)

theCrag / website