New maps api endpoint for requesting all nodes within a bbox to a certain depth

brendanheywood commented 6 years ago

What happened?

This has been partially spec'ed in other cards but I've pulled it out as the api is needed by both the new web maps page as well as the app maps page.

https://github.com/theCrag/website/issues/985 https://github.com/theCrag/website/issues/1610

In particular there is a lot of cross over with this spec and I think they should be interelated, ie you could call them both at the same time in one call:

https://github.com/theCrag/website/issues/1145#issuecomment-21710432

Principals

We want to load as minimal data as possible
We want the api doing as little work as it can
We want to cache aggressively on the client, small zooms and pans should not need network and potentially even work offline

Use cases:

There is 3 use cases, but 2 & 3 are just specials cases of the first:

1) I go to the world maps page (on the web) and I want to load in a single api call all the data needed to show the gear style donuts 2) I have toggled to the maps page for a particular crag or region, and want to load just the data I need in order to render the gear style donuts for the area on the map 3) In either situation above, I have zoomed or panned, and now want to request just a delta to get any extra data I need to render any donuts I am missing.

As always we want to return the absolute minimum data and do the least processing server side and make this as snappy as possible. The api call broadly needs these arguments:

1) bbox: What is the bbox I am viewing that I am requesting data for. In the case of the world this is empty 2) min visual angle: How deep should the api go to look for nodes, this is related directly to the size of the viewport the map is being rendered into. This should be expressed in the same units as the bbox, ie an angle in degrees 3) nodes: What data do I already know or don't know? 4) fields: what node data do I need (eg name, grade styles, other stats) broadly the same as other endpoints

API data model format

The api will start at the world level, and drill down into each child and if that child has a bbox wider or higher than the visual angle it will recurse into it. It will build up a tree structure based data model which a subset of the whole index.

Note this data model is just a guide, it doens't need to be exactly this (and it's also different to the rough spec in https://github.com/theCrag/website/issues/1145#issuecomment-21710432)

{
    "visible": [
        {
            "11737699": [
                name: "Australia",
                geometry: ...
                children: [
                    {
                        "11738203": [
                            name: "Tasmania",
                            geometry: ...
                            children: [
                                {
                                    "11744707": [
                                        name: "Blackmans Bay"
                                        geometry: ...
                                    ],
                                    "188180889": [
                                        name: "Bruny Island"
                                        geometry: ...
                                    ]
                                }
                            ]
                        ]
                    }
                ]
            ]
        }
    ],
    "here": "11738203"  <-- spec for this in #1145
}

If we do not recurse into something then we DO NOT include a children field. If we do then we MUST include a children field. If a node has no children then it must say:

children: []

This is an important distinction when it comes to the delta nodes and caching later on.

Example 1: world page on desktop

We want to render a world map at a resolution of say 1000x600 pixels which will look vaguely like this:

Roughly speaking at this size and zoom, each country should be represented by a single marker, but some countries like the USA, australia, canada, russia and brazil are big enough that we will instead go down to the next level and we'll get a marker per state / region. In some cases we may go deeper but it's all based on the visual size of the node we are dealing with. At this resolution, australia is about 100px wide and is 40 degrees wide, and we'd want any node which is larger than about 20px to be recursed into. We are looking at a map which is 360 degrees wide at 1000px with a cutoff around 20px, so roughly the visual angle we are after is around 20px / 1000px * 360deg = 7 degrees.

This would call the api with no bbox (or a bbox of -180-+180, -90 +90), and a visual angle of 7 degrees, and the 'nodes' will be empty. When the api recurses down to australia, 40deg > 7 deg, so Aus will get split into states. The bigger states like WA, SA, QLD may be split again.

Example 2: world page on mobile

This is exactly the same as above, except we are on a device which is only 320 pixels wide, so the visual angle is 320 / 1000 * 360 = 115 degrees. When the api recurses to australia, 40 deg < 115 deg, so australia will be returned as a single unsplit node.

Example 3: Australia on desktop

Same again, only this time the bbox is zoomed in to just showing australia, which is around 50 degrees east / west. So this time our visual angle threshold is 20px / 1000px * 50deg = 2.5 degrees. At this zoom level all the states, even tasmania which is only ~4 degrees wide will be split into the sub regions.

Example 4: Delta Panning around, from Tasmania to Victoria

Lets say we started at Tasmania with a cold browser cache, and so now we have all the cluster data for tassie at that zoom level. Now we pan north. Whether we are in the app or the web, we can and should be aggressively caching this data. If we only panned a couple pixels in any direction the client should be smart enough to detect that it doesn't need to make any api call at all. It can do this because it has access to all the ancestor data, and importantly it has access to it's uncles / great aunts basic bbox data.

So specifically is we originally asked for tassie data, then we should get back data for:

all top level continents, with minimal bbox but no geometry (our great uncles)
australia (because we are in it) with fields we asked for
all children of australia with minimal bbox but no geometry (our uncles)
tasmania (because we are in it) with fields we asked for
all children of tasminia with as much as needed for the visual width cutoff

If we were 10 nodes deep we could get all 10 of our direct ancestors siblings minimal bbox data.

Now if we pan a couple pixels north, we already know that we haven't yet overlapped with any of our uncles or great uncles bboxes yet, so we don't need to make an api call at all.

Then if we keep panning north eventually we will hit the bbox for victoria. Now we make essential the same api call as we originally did for tassie, but we provide the 'nodes' field with the victoria node id. This tells the api to start recursing from the victoria node id instead of the world id, and to also only return data from that node in instead of rooted at the world node.

Example 5: Delta Pan north west to nsw / SA border

If we then did a long pan north west so that we now overlapped with both the bbox of both SA and NSW, we rinse and repeat, but this time we ask for data start at both the SA and the NSW nodes, and we get two data sub trees back.

Example 6: Delta zooming in

The same logic applies to not just panning but zooming. We can recalculate the visual width and re-render, but if the new enlarged bbox's are still smaller than our visual width we don't need to go back to the server. It's only when we zoom in far enough so they are big enough that we ask for them to be split.

scd commented 6 years ago

Guess what the biggest blocking factor of this is. The data structure stores the bounded box as a string not four separate integers. We need a data migration task, damn it.

scd commented 6 years ago

I am a little confused with the rules for the children. From a performance perspective I would like to return just the nodes which meet the visual angle condition and their ancestors.

If we set children field does this mean all children or just the ones that meet the criteria.

From a different perspective if an area meets the criteria do I get all it's siblings regardless of whether or not they meet the criteria.

If a node does not have location info then we don't need it in the map. So this partly answers my question.

If a node does not meet the criteria then we don't need it in the map at this zoom level. If we change zoom levels then the node may meet the criteria and be returned. Therefore I think we don't need it in the original query as long as the client does not make assumptions that children is the complete list from any api call.

BTW I am not recursing anymore, but rather a single sql query that meets the criteria and returns the full ancestor heirachy. This seems to me to be really efficient.

brendanheywood commented 6 years ago

I think asking for children as an extra field for this endpoint doesn't make sense. The fields I had in mind were grade band, style band, other stats, number of favs.

scd commented 6 years ago

If we are only returning unseen nodes then the here variable may point to something that is not returned in the api. Is this correct?

brendanheywood commented 6 years ago

Yes, the 'here' logic is completely independent and including it here is mostly for convenience so we don't need to make two api calls. But if you are in either the web or the app, and you have panned around for a bit, then everything on screen is now cached offline then here logic should be 100% client side. So that's why I split these into different issues. The app probably won't use the server side here logic at all. The web version might use it initially and use the api more than it needs to, but then after we implement the offline store it will implement it here logic client side as well.

scd commented 6 years ago

I have experimented with getting some extra fields to see if that slows things down much

geometry: the boundary fence for the area
gearStyles: gear styles for the area (this may require some more work because unknown is not included), but it is fine for demonstration purposes.
numberAscents: from stats records
numberRoutes: ditto
numberFavorites: ditto
userNumberAscents: from user node stats records, an accountID must be provided and it returns the user ascent count for that user.

Information for all this in a world node query was returning at less than 100ms on my dev server (warm cache). So as long as we get info from the stats records this is looking promising.

brendanheywood commented 6 years ago

We don't need it right now, but one thing I would expect we'd want is user ascents breakdown into grade bands and gear styles so we can show a heat map off where and what a person has climbed

scd commented 6 years ago

That is not currently in the user node stat records but we could add it. Once added it would fast.

scd commented 6 years ago

Is there an issue with standardising the way bounded boxes are defined, in particular bottom left corner followed by top right. The main thing here is that if we standardise the order then it becomes a lot simplier to detect cross date line bounded box (ie long2 < long1).

brendanheywood commented 6 years ago

Adopt this

https://tools.ietf.org/html/rfc7946#section-5

brendanheywood commented 6 years ago

specifically this: https://tools.ietf.org/html/rfc7946#section-5.2

scd commented 6 years ago

OK that is exactly what I though it should be.

scd commented 6 years ago

I think I had misunderstood something about visual angle. My initial assumption was that the visual angle includes only the nodes which have a bigger visual angle. I have just re-read what you wrote and it seems you want the next node in the heirachy which is smaller than the visual angle.

My assumption was largely driven by the efficiencies of the sql because I do a simple test on LongAngle. To get what you want we need an extra sql join with children, which is less snappy and produces far more nodes. The underlying query went from 50ms to 300ms (330 nodes to 2000 nodes) on cold cache. The warm cache api is still less then 200ms on my dev.

@brendanheywood I am just reporting this to give you some idea of the query speeds.

scd commented 6 years ago

We cannot really move much further forward with this until the next release. Here is a brain dump of some of the next things that need work

[ ] Different levels of data depending on the scope of the node (eg uncle, great uncle).
[ ] Handle cross dateline bounded boxes
[ ] Do we put limits on what the api accepts (eg world view port with a very small visual angle).
[ ] How do we specify known nodes and what should be returned.
[ ] Do we need to optimise the data across the wire
[ ] Review what is returned is what we want. This is hard to go through test cases without the UI to generate stuff.
[ ] Script to remove locations from test area as it creates additional processing load on queries
[ ] Script to remove locations from merged areas (old bug left some merged areas with active locations).

brendanheywood commented 6 years ago

That 330 -> 2000 nodes feels wrong to me, I have a feeling there is one more layer than we want. But yeah lets just pause this and get the visual debugging working first. I wasn't expecting this work to really kick of soon, I just didn't want it blocked by a lack of spec. There are another couple maps specs we need too which I'll do now.

I'll also been mulling on alternate strategies. I think we should ask for all the 'uncles' as a field, that way the calling page can decide, it only needs them if it is caching + deltas otherwise it's a waste.

scd commented 6 years ago

I just wanted to get something started here because I had no idea of how difficult this would be and felt unable to properly contribute to our app discussions. It also needed a release cycle.

brendanheywood commented 6 years ago

Just some more thoughts on this, which has some overlap with this one too:

https://github.com/theCrag/website/issues/2175

For the maps search with is in the process of being mocked up, we will want to be able to ask for a specific type of stat, and also potentially do some sort of filtering. Depending on what we are filtering on this might be done client side or server side.

Initially I think a safe set of starting assumptions would be:

Primary dimension:

routes
ascents
climber (ie show a map of where your follower are, or where the site admins are, or supporters)

Secondary dimension:

[ ] show gear style donuts on the maps, eg request 'gearStyles' field
[ ] show grade bubbles , eg request 'gradeBand' field
[ ] I can imagine others but two is enough to prove it for phase 1

Filters:

[ ] by gearstyle
[ ] by grade
[ ] by stars
[ ] by account

Eventually I would like to bring all the facets up to parity and also

Even though internally the maps code will call the api, I think the external maps urls should follow a pattern mirroring the facet pages.

Route / ascents facet: climbing/australia/routes/with-stars/2/with-grade/AU:10:30/with-gear-style/trad/ climbing/australia/ascents/with-route-grade/AU:10:30/with-route-stars/2/with-route-gear-style/trad/

Map facet url with filtering: climbing/australia/maps/with-stars/2/with-grade/AU:10:30/with-gear-style/trad?show=styles <- default climbing/australia/maps/with-stars/2/with-grade/AU:10:30/with-gear-style/trad?show=grades

The current map urls assume we are talking about routes but we also need to talk about ascents or in future other primary dimensions. So do we want something like this /ascent-maps/ vs /maps/ ?:

climbing/australia/ascent-maps/with-stars/2/with-grade/AU:10:30/with-gear-style/trad?show=styles climbing/australia/ascent-maps/with-stars/2/with-grade/AU:10:30/with-gear-style/trad?show=grades

The client side vs server side filtering is not as trivial as I first though. For instance if we start with the default data we have we could filter client side by gearband and show that, but if we are showing gradebands as the secondary dimension then we cannot filter by gearstyles client side. So I am leaning towards all filtering being done server side and cached client side.

scd commented 6 years ago

@brendanheywood it looked like an SQL change was not properly implemented and got lost. I have remade the change so you will have to pull from repo and do an ant build-db (about 10 minutes).

Example api calls can be found

/data/jobs/CIDS/CodeBase/Custom/Script/api-test/bbox-query-01.script /data/jobs/CIDS/CodeBase/Custom/Script/api-test/bbox-query-02.script

You will have to modify for brendan

brendanheywood commented 5 years ago

done

theCrag / website