theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
112 stars 8 forks source link

bbox maps search api #297

Closed brendanheywood closed 11 years ago

brendanheywood commented 13 years ago

/api/map/search?bbox=top,left,bottom,right

returns a standard node data structure of the largest unique node which is fully contained in the bbox. If there is more than one node then find the common ancestors of all those nodes.

brendanheywood commented 13 years ago

It should also have the same 'show' param as the normal id for getting more data

/api/node/id/{nodeID}?show

scd commented 13 years ago

What a cool API function. But alas it highlights a data design flaw so it is not simple. I've coded the bounded box as a string rather than a separate field for each coordinate. I don't know why I did that, usually I am so particular about these things. Even If I fix this flaw in the database I still have to do some smarts because sometimes the location will be a point (eg route).

BTW the concepts of largest and common ancestor conflict. I can either do the largest or the common ancestor. I guess if the nodes are the same size (eg a route) then we could resolve by doing the common ancestor.

How urgent is this? I'm going to have to think about the best way of implementing it. The main issue is performance, as I would have to open every node with a location for each call.

Are there any issues with caching and having slightly old data. Maybe I could create a temporary cache which is refreshed every 24 hours? In the long run, location data should be fairly static, so caching may be ok.

brendanheywood commented 13 years ago

For this particular purpose it can make a few assumptions, it will only ever return an area, and if the area is only a point and doesn't have a bbox I am not interested in it anyway.

However I can envisage a bunch of new api calls in future will all take a bbox as input so we should get the infrastructure right to support this anyway:

300

301

This isn't urgent at all, it's just a nice to have for the maps release.

In terms of performance long term the best and easiest thing is just turning on all the built in geolocation stuff in mysql. I know there was issues with that, I think dependancies on debian upgrades?

Even if we just convert them to floats and roll our own geo code performance shouldn't be too bad. If we average 5 nodes deep and each node has 20 siblings it's only 100 comparisons. The comparisons will be lightning quick, the slowness would be in the latency between perl land and mysql. If it was a stored procedure this would disappear without really needing a cache. If it has a btree index it should be shit quick. If we do add caching, then just caching the top layer of nodes, say regions down to TLC's would give the most benefit as they change slowly without causing issues when stuff lower down. As a strategy I think we should focus more on DB optimisations and only cache as a last resort.

Also the largest and common ancestor shouldn't ever conflict in theory. If we set a boundary on an area the bbox up the structure inherits this. If it is set by a human, an area should contain it's children. Either way I'm not too fussed, it could be a quick heuristic to be right 95% of the time. (eg we're just using bbox anyway and not even considering the shapes themselves). The important thing is that it only returns a single node, and if in doubt it returns a node further up the chain.

scd commented 13 years ago

I didn't know MySQL had spatial support. There could be some issues using this, especially with indexes which is the whole point for us. I don't think I can put the MySQL side of things in the critical path for the Maps release - there is a bit of a learning curve, a database upgrade and some testing to do. I would not want to commit to any deliverable here until I have a demonstration that MySQL spatial indexes works for InnoDB tables as we require. I'm a little suspicious of the performance as well.

If we have to upgrade to a MySQL 5.1 then we need to get the newer debian. This is on my list of things to do, but it is a big job. Upgrading debian operating systems is fairly simple, but carries significant risk that server will be corrupted and just not work, so to do this risk free we need a new temporary server. My preference is to see if we can hold out until we really need a new server.

There is also a chance we may have to go to MySQL 5.5 which means overriding debian's package management and doing our own mysql installation management. I don't feel as though I have the expertise to tackle this at the moment.

I could temporarily solve this with a cache (btree style) if it becomes urgent.

brendanheywood commented 13 years ago

We'll I'll leave it up to you to have a play whenever you want. I'm in no rush. I recon implement as floats in new fields and test what the performance is like and go from there.

brendanheywood commented 13 years ago

I'm not sure if you've done anything on this but for now lets put it on hold. I'll implement the maps stuff a different way.

scd commented 13 years ago

Ah I didn't realise you were waiting for this. I think I have to do a bit of playing around and may require some database table updates. I can write something, but it just might not be efficient enough to let us scale the way we want to.

Did you see the api calls http://dev.thecrag.com/api/map/summary/11740915 http://dev.thecrag.com/api/map/summary/ids?id=26194194,11740915

Mostly as per your instructions.

brendanheywood commented 13 years ago

Yes I've seen the sub crag stuff which is prefect, I had a play with it yesterday.

But for this new bbox endpoint don't bother. I would like it further down the track for other things but for the maps page I need to pull a heap of data at different times and I'm pretty sure I can do it all with the existing API endpoints.

For the use case of starting at the world and zooming in the loads will be progressive and fine. For the use case of linking directly to a bbox from list view, it will pass in a node hint so it can quickly load up the stuff it needs. I think performance will be fine without a bbox search.

brendanheywood commented 11 years ago

I think this one is well and truly already done.