Closed brendanheywood closed 6 years ago
Given:
1) Start at worlds children nodes 2) Find any node who's bbox overlaps the viewport, ignore the rest 3) If a nodes shape is over a visible size, then split it apart and recurse into it's children
The definition of 'visible' is if any dimension of the shape is wider than X pixels, where X is determined purely on the zoom factor. So a very thin long cliff with 10 children cliffs may never have enough area to ever be considered 'here' with the current algorithm, but will get picked up by this new one. Likewise a curved banana cliff is very difficult to zoom into currently for the same reason. Ultimately this will return a tree structure of the whole world, but filtered purely by geography and not hindered by the node structure.
So if I am say near my house and I'm zoomed out so both Blackman's bay and bruny island are on the map I should get back from the api:
{
"visible": [
{
"11737699": [
"Australia",
[
{
"11738203": [
"Tasmania",
[
{
"11744707": [
"Blackmans Bay"
],
"188180889": [
"Brunyisland"
]
}
]
]
}
]
]
}
],
"here": "11738203"
}
The 'here' logic is very simple. It needs to:
The algorithm:
eg given 5 points with these hierarchies:
a > b > c > d > f
a > b > c > d > f > g
a > b > h > i > j > k > l
a > b > h > i > j > k > m
a > b > h > i > j > o
In this case 'j' wins
Why this works: Every bbox is a rectangle, (vs the boundary which is not gauranteed to have any spot 'inside') and regardless or it's size, when you focus on the box in gmaps, it will snap to an integer zoom level, and pan to center the box. In all situations any one node must have 3 points active, either west to east, or north to south, in the box.
When it fails: When a child area bbox fills the parent bbox ( or other nodes). There probably isn't a robust reversible algorithm for this case, I'm happy to not bother with this edge case.
By primarily using bbox we get the added bonus as saving a heap of computation when comparing shapes, and we will need to compare a lot of shape x5 points, so it all ads up. If any bbox side test fails we can exit early saving more comparisons and lookups. Additionally we know we need a quorum of 3 nodes out of the 5, so instead of doing a full scan for each of the 5 points we can process them together until they diverge and then stop saving more cycles.
Currently we call the existing end point, get back some id's, and then make a second call to selectively get the data we need. This was to eliminate lots of data over the network if the map was only panned a small distance. But on mobile it's mostly the latency that sucks, so I'd like to eliminate the second call and embed just the name of the node. The rest I'll get on hover or tap as needed.
The big question is, given how memory fragile the maps is, should we not cache anything in memory, and if so, should we add more data to this call and completely remove the existing clustering. If we do this well need to add the 'remaining' route count to each node, and also include all descendants who aren't big enough to have a label, but are big enough to show a cluster of routes. I'm kinda leaning towards this but need to thrash and mull a bit more. Essentially this moves all clustering logic to the server.
I've just expanded a bit more detail on the '5 points' algorithm above, I assume that if you do this part you'll only do it, but not all the other maps api stuff, just enough to close #1403 / #916
This new implementation replaces the login in /api/map/bbox but the interface should be exactly the same and the js maps shouldn't need to change at all.
Also if we do move the majority of the clustering logic to the server, then this also subsumes #421
@brendanheywood with this algorithm you are recursing if the dimensions are wider than X pixels. Do you also mean higher they Y pixels as well. Otherwise a vertically aligned thin area might not trigger but a horizontally alligned one will.
I have just done a data migration to split bounded box from a single text field into separate number fields and I also added the long and lat angles for index optimisation.
I think the core query for this something like this:
SET @long1 = -180;
SET @lat1 = -90;
SET @long2 = 180;
SET @lat2 = 90;
SET @angle = 7;
SELECT SQL_NO_CACHE
L.ID,
L.NetworkNode AS NodeID
FROM
NodeLocation AS L
WHERE
L.ActiveFlag = 1
AND L.BBoxPt2Long >= @long1 AND L.BBoxPt1Long <= @long2
AND L.BBoxPt2Lat >= @lat1 AND L.BBoxPt1Lat <= @lat2
AND L.LongAngle >= @angle
Which returns 227 location records, which sounds about right. Note that the LongAngle
field is for index optimisation. Using a maths forula (long2 - long1) the query takes about 100ms, but using the indexed pre-calculated field this goes down to about 15ms.
Hopefully adding all the other info does not slow the query down to much.
Yes the max width or max height, using either min or max depending on context. So assuming LongAngle is populated with the max of the width and height then all good
Also need to test the international date line wrap around, we've had issues in the past with that
Hmmm, international date line. The bounded box supplied to the api end point could be over the date line or the bounded box of nodes could be over the date line. I think the first is dealt with by detection and modifying the search query. The later could be done by detection on nodes at query time, but then this would lose a lot of efficiency. I am pretty sure we will need a field which says the bounded box goes over the datelline.
Also there are a few other edge cases in the query, which I think I have mostly worked through and kept it relatively efficient
I am going through the here algorithm logic, which I cannot fully follow. If the viewport is the whole world then the 5 points are [[0,0],[-60,-30],[-60,30],[60,-30],[60,30]] which leaves the here value pointing to Africa
"hereCount" : 3,
"bbox" : [
-27.6581187,
-41.1761476,
63.9678578,
40.3488
],
"name" : "Africa",
"id" : "11737843",
@brendanheywood would this be what you expect?
Yes that actually does make perfect sense following the algorithm. The choice of 1/3 the viewport in each direction is easier to explain with a diagram, but that value of 1/3 could be enlarged to as high as 1/2 and still be correct for the roundtrip node selection, so try tuning that first. Ideally I'd like to see that visually on the map and pan around and see how it actually feels.
Tuning we can do later when we have the benefit of something visual, so I will leave it as it is. The main thing is that I wanted to confirm that I did not misunderstand.
I have gone with a pretty simple implementation which is to find the deepest nodes which pass the three point tests. If there are more than one then just use the closest centroid distance. If we need to enhance this then it will be clearer when we have that visual display.
Should the here
variable be an id or a node with url etc.
Just an id is fine. In the simplest case where the client does no caching, it will never pass in a set of nodeid's to start the search from, which should mean that the data for the here node is guaranteed to already be in the payload.
I think it's worth tuning that 1/3 from .333 to .4, it's a safe change and will mean at the world view the world node will be 'here'
Finally done :)
There is a growing number of common map use cases where the existing algorithm isn't working great. So collecting some test cases here to help inform improvements. I've got a new algorithm in mind which is only slightly different the current one which should fix all of these (after responsive, after topos, after....?):