Reduce duplication of rendering effort

pnorman commented 8 years ago

From https://github.com/openstreetmap/chef/pull/78#issuecomment-239563299

The rendering machines are, currently, completely independent. This is great for redundancy and fail-over, as they are effectively the same. However, it means duplication of tiles stored on disk and tiles rendered. Duplication of tiles on disk is somewhat desirable in the case of fail-over, but duplicating the renders is entirely pointless.

Adding a 3rd server, therefore, is unlikely to reduce load by 1/3rd on the existing servers from rendering. However, a lot of the load comes from serving still-fresh tiles off disk to "back-stop" the CDN, which would be split amongst the servers (sort of evenly).

What would be great, as @pnorman and I were discussing the other day, is a way to "broadcast" rendered tiles in a PUB-SUB fashion amongst the rendering servers so that they can opportunistically fill their own caches with work from other machines. At the moment, it's no more than an idea, but it seems like a feasible change to renderd.

Currently the two servers are independent, and clients go to one based on geoip. This means that the rendering workload is not fully duplicated between the two servers, as users in the US tend to view tiles in the US and users in Germany tend to view tiles in Germany. This has been tested by swapping locations and seeing an increase in load.

Unfortunately, this doesn't scale well to higher numbers of servers.

pnorman commented 8 years ago

Prior to the setup of orm a number of options were discussed, but they all had problems with limited bandwidth between sites. Is this still an issue?

With three servers we could have three rendering locations and would need to transfer data from any server to any other server. What are the weakest links in the connections?

I am defining server as a server running the tile store, renderd, and mod_tile. This could actually be split up and a site as a location where there are local connections between the servers.

I think we probably want the following characteristics in a setup

Any one server or one site can fail without taking down tile.osm.org
Anyone server or site can fail without additional rendering load. Reduced capacity is acceptable and expected, but we want to avoid the combination of reduced capacity and increased load at the same time
Tiles rendering on one server get replicated to another site

For both simplicity and minimizing inter-site bandwidth, I think it's best to have

A mostly full copy of the tile store at each site
Tiles rendering on one server get replicated to all other sites

If we had multiple servers at one site we could look at more complicated tile replication strategies, but I think with only 3 servers it's not worth it.

iandees commented 8 years ago

Your description there sounds a lot like Bittorrent Sync.

pnorman commented 8 years ago

renderd supports tile stores on memcached and rados/ceph. I believe ceph supports what we need, but I'm not 100% sure. I see some potential issues with ceph

Has anyone tried renderd rados-based storage in production?
Has anyone tested mod_tile and rados in production?
Do we have experience setting up ceph, or will this be new software we need to learn?

Given that renderd has support, this might be the best option.

Renderd also supports distributing rendering requests between multiple renderd instances, but I don't recommend this

It is probably less tested than rados
It's another load balancing to manage
It could introduce additional latency going site to site
We should be able to accomplish the same by balancing the CDN between the render servers.

pnorman commented 8 years ago

Going this direction is also consistent with vector tiles - most of those implementations use some service as a vector tile store and often have it running on different servers.

zerebubuth commented 8 years ago

Yesterday, yevaud rendered 963,135 distinct metatiles and orm rendered 859,036 of which 303,923 were the same. If only one copy of each of those were rendered, that would be an overall saving of 17%, which is nowhere near as large as I'd have hoped.

Also surprisingly (at least to me) is that the duplication is relatively stable across zoom levels. I'd have expected far more duplication at low zoom levels than high. But perhaps people are as likely to look at details in a map of somewhere far away as they are nearby - or perhaps those are just less likely to be in cache.

Interesting data, and means that a 3rd machine will likely be more useful than I'd previously thought.

pnorman commented 8 years ago

Yesterday, yevaud rendered 963,135 distinct metatiles and orm rendered 859,036 of which 303,923 were the same. If only one copy of each of those were rendered, that would be an overall saving of 17%, which is nowhere near as large as I'd have hoped.

If a single server has a capacity of 1, then two have a capacity of 1.66.

If you assume that the statistics remain the same and that when rendering a tile there is a 17% chance that a specific other server has the tile, if you go to three servers then for a request there is a 31% chance one of the two other servers has the tile. With each server spending 31% of its capacity duplicating work, the total capacity is 2.07, an increase of 25%. If everything was distributed ideally it would be an increase of 50% (2x actual)

My gut tells me that the statistics will not be the same and 3 servers will be slightly better than this model, but it gives us a place to start.

With four servers, it is a 43% chance of duplicating work and a total capacity of 2.29, an increase of 11% instead of 33% (3x actual).

So we could set up three servers and not be too badly off for duplication, but beyond that it gets worse.

pnorman commented 8 years ago

cc @apmon about renderd

If we cut our duplication in half by 8.5% we'd gain 10% for two servers, 21% capacity for 3 servers, and 34% capacity for four servers.

Are there any tweaks to load balancing that make sense without removing the ability to have a server fail and its load get redistributed?

nicolas17 commented 8 years ago

This is probably a naive suggestion, but what about making the CDN use some sort of hash (even as simple as (x+y)%n) to decide which server will render a given tile? That would reduce duplication of rendering work and caching, since a tile would always be requested from the same server. The hashing would need to take metatile sizes into account; otherwise it would send requests for different tiles in the same metatile to different servers and make things worse than now.

It still has to support failover to another server if the hash-chosen server is down or taking too long, which would add some cache duplication, but it would be minimal compared to now.

tomhughes commented 8 years ago

Is this ticket in the right place? It's not clear to me that this is a "chef" issue at the moment because it's not something we can solve just by writing a chef recipe...

pnorman commented 8 years ago

making the CDN use some sort of hash (even as simple as (x+y)%n) to decide which server will render a given tile?

The problem is that if you then have a server go down your tile store hit rate goes down because the hit rate is 0% on the new load. You can use something like that to distribute work, but you still need a shared tile store.

Is this ticket in the right place? It's not clear to me that this is a "chef" issue at the moment because it's not something we can solve just by writing a chef recipe...

What would be a more appropriate place? It's eventually going to result in recipe changes

gravitystorm commented 8 years ago

What would be a more appropriate place?

Usually the operations tracker is where high-level things like hardware resource allocations, budgeting etc are considered.

zerebubuth commented 8 years ago

Closing in favour of https://github.com/openstreetmap/operations/issues/101 - where it's more appropriate and we can track aspects of this which aren't just Chef-related. Apologies to anyone following the breadcrumbs :disappointed:

openstreetmap / chef

Reduce duplication of rendering effort #85