Closed zerebubuth closed 7 years ago
Many commercial providers require developers to sign up and get some sort of APIKEY these days for their application / web site. ... Regarding "legacy" users: maybe we could treat all those clients without APIKEY as some kind of "micro plan" user with very low usage limits, giving developers some incentive for signing up.
There are many techniques to limit or control access, and I'd like to have a discussion about them after we've reached some kind of consensus on what the policy for access should be. At the moment there's a very wide range of views about who we should be serving tiles to - just mappers, everyone, etc?
I think perhaps the question of "should we do activity X or not" is very different from the dilemma we're faced when running the servers, as it suggests that the alternative to "action X" is inaction, rather than some other improvement to the OSM servers. Although not perfect, some better questions might be:
Of course, in real life it's neither is a binary decision. Underneath all the real-world complexity, however, it is true that time or money can only be spent once, and there never could be enough of either.
If you'd like to help out then:
Hi,
a lot of important considerations have been made in this thread already and i have little to add there.
One observation:
From the numbers @zerebubuth provided it seems the long tail is - as to be expected - fairly thick at the beginning - in other words: the top 100-200 of the 24k sites using tiles account for probably at least 20 percent of the total traffic of the tile servers. It might be a good idea to have a closer look at these, maybe even make this list publicly available on a regular basis. Since we are only talking about tiles that are not served from cache here i have to admit i have very little idea what to expect in such a list.
I quite like the idea of prioritising tile rendering for contributors and puting everybody else on the lazy-render queue (assuming that such prioritising would help performance). A few people here have suggested url schemes to achieve this, but what about using the ip address of actual contributors instead ?
When a changeset or trace is uploaded, remember the uploader's ip for 24 hours. When scheduling a render, lookup the requester's ip in the contributor db. There are a few key/value stores that support sets, TTLs, distribution, and HA, making the db part easy enouh,
@vincentdephily : Interesting idea to recognize actual mappers and give them priority in rendering!
IPs by themselves may not be such a sharp indicator, since there are mechanisms like NAT and Carrier-grade NAT that may give multiple users (even at different locations) the same IP address.
Editors, e.g. JOSM know the credentials of its users. How about using OAuth to enable this priority access?
A word of caution: May such a mechanism invite people to do dummy edits to gain priority access?
The idea of two tile services or linking edits to rate-limiting doesn't solve the problem of resources, it makes it worse! With more moving components, even more time will be spent on this secondary service instead of primary ones like the API.
Linking edits with rate-limiting would require development work, and I'm pretty sure most of the developers involved do not consider it a priority.
I volunteer to help maintaining and making software adjustments to the tile service, if overload on current sysadmin team is really the issue. @zerebubuth @Firefishy is the Vagrant branch really still actual? Should I try to revitalize that, or that one is just in playground state and I'd better focus on improvements in current serving scheme?
A better way to go would be to let people contribute cachies proxies, without necessarily runing their own renderers (which is much more demanding). As these proxies installed by some web/app developper will cumulate requests for many users of these website/apps, their tile usage (from the same few source IPs) would would much higher than normal users, they currently cannot do that without being blocked for excessive use. However, if the proxy also allows people around the world using these caches, not just the users of the webapp/site), we could transform it into a more powerful CDN than the existing CDN. It should be possible to secure the proxies so that they will return reliable data, and will respect the caching expiration times (native to the HTTP protocol).
But now is the usage exploding for serving prerendered tiles, or because there are too many tiles to redraw in renderers for the OSM.org's Mapnik style ? This is a second problem that a simple CDN cannot solve: in the past there was Tiles@Home for delegating the rendering to many contributed renderers but this has stopped.
So How can we improve the efficiency of caches and help them coordinate better to distribute the workload/IO storage/network usage? Isn't it a more general problem not specific of OSM but to any web application whose content will be delivered and used by many people (including for example wikimedia wikis)? Isn't there any discussion in the development of wellknown caching proxies? Is there a way to help them synchronize each other faster and more efficiently (possibly using a sideband protocol).
Some P2P protocol could help building and securing a large network of interconnected proxies, with signed contents (using some "blockchain" ?) to avoid other kind of abuses (for relaying other illegal or pirated contents). But may be there's also some research in this domain in the W3C or HTTP workgroup. For software distribution now everyone uses efficient protocols such as Rsync and Torrents. These protocols are not easy to implement on small mobile devices (not really P2P capable), but if any developer of a web app or website wants some performance and reliability of their app for their mobile users, they need to do something and implement some serverside helpers: implmenting a cache and synchronizing it efficiently while still being able to use the power of the rest of the network and contribute to it by some share is something to think about.
For now developers fear installing their servers. But we have to convince the developers that things won't go better if all the hard job is systematically handed out and taken for free from another highly demanded source.
Before we have a better protocol, all we can do (and should do now) is to extend our too small CDN with more servers worldwide. Not all these caches need to be directly managed by the foundation, we just need an agreement and some service quality monitoring tools to avaluate the best working mirror.
Participating in this thread super-late but I only saw it today, so: I feel strongly in agreement with @Komzpa’s Nov 1 comment that OSM tiles offer both infrastructure and persuasion for the project, as well as the Oct 28 comment suggesting that tile usage offers an opening to seek support and new collaborators. I don't love a policy change which limits the usability of OSM tiles like this. It’s clear from OWG members participating here that this is a source of stress for the services, so I would prefer to see a strategy which seeks to expand technical and financial support for all tile usage.
It does make sense to offer two tiers of tile usage, though: one for editors who need to view the latest updates (10-15%?) and another for sites and apps who don’t anything close to that level of freshness. The boundary between the two will always be fluid and hard to define, but supplying a right way to use tiles for games and apps may help underscore the value of the tiles as well as elicit support for their hosting.
@migurski Really good to see you in this thread as I think you're probably better positioned than pretty much anyone here to help make things better.
It's clear that the demise of MQ Open's unmetered tile service caused additional stress to OSMF's tileservers, which since the early days of the project have principally been provided "for editors who need to view the latest updates".
It's undisputed, I think, that there are wider benefits if tiles made from OSM data are readily available to young/small-scale "sites and apps", though there is rightly much concern at heavy users (Pokemon Go trackers, etc.) freeloading off services which are funded principally through donations.
I would therefore suggest that this is an ideal opportunity for Mapzen, as the best-funded organisation with significant raster tile infrastructure expertise, to step into the role formerly occupied by MQ Open and provide an OSM-based free-access tile product of the sort that you and @Komzpa are advocating. If you were feeling generous you could eschew your own branding and arrange with OWG for it to be available at public-tile.openstreetmap.org, but actually most users seemed happy with the MQ Open arrangement and maybe Mapzen-provided free tiles with Mapzen branding would fulfil everyone's wishes here.
How about it?
I think it’d be really interesting, at least as a speculative conversation. @zerebubuth’s team are the caretakers of Mapzen’s vector tile infrastructure. I’m also very personally familiar with the role of MQ Open — for many years, my standard recommendation to Code for America fellowship projects depended MQ because it was a reliable, free source of easy commodity tiles. These days, Stamen seems to fill that role with Toner and Terrain cartography. None of these options look quite like the OSM-Carto designs, though, which strikes me as important to deliver on the promotional idea. Just anecdotally, I've received mailers from local real estate agents at my house who’ve screen-capped OSM tiles and provided correct attribution. It was surprising in a good way.
It also sounds from Evgen’s suggestion and further comments from Grant that there is a technical opportunity here to introduce shared caches? I respect that Matt asked for policy-only opinions here, but this does feel like a situation where the right policy will be strongly influenced by feedback from operations, fundraising, and technology.
I have a couple of points against going to external vendor for this kind of service:
MQ Open existed long enough but was closed because of business priorities. There's no reason why any other vendor would live long enough. OSMF-hosted stuff is more sustainable, at least because there is control from OSMF side, at least in form of "we do our best to just prevent outages".
Forking existing "core" functionality into another service leads to stagnation of the "core" service. This happened to API (overpass/*xapi look a lot better for reading, and improvements from data organization from their internals weren't in any way ported to main DB, see #135), this happened to the Mapnik stylesheet in pre-carto times (many thanks to everyone involved to xml to carto migration).
There are purely technical tasks that can let this happen without any forking of project on any level, part of which listed on https://github.com/openstreetmap/operations/issues/113#issuecomment-258606170 - this can be done under OSMF flag instead of doing it under a commercial entity's flag. We're not building a community of OSM developers this way.
Regarding building more automated analysis tools mentioned in @zerebubuth's comment, I've sent a request to open raw munin data to operations@osmfoundation.org shortly before New Year - I hoped to dig it during New Year holidays, I'm still waiting for a response.
I think this issue has stagnated. There is clearly a lot of opposition to changing the tile usage policy in the way that we hoped to, but no clear way forward is apparent either that resolves any of the problems that OWG continues to face.
I will close this issue since there is nothing more to be done here now, and there has been no progress in the last six months. I encourage everyone to read through the list of ways you can help that @zerebubuth posted earlier.
Blocking users of apps that are not identifiable would be a bad thing. Yes we should think about creating an alternate tile service with long cache and less frequent updates, and its own CDN: instead of blocking users, this could easily be used to redirect immediately requests from unidentifiable sources, or from sources that are known to be imprecise ("Android" for any mobile app not using any API key or tuning their referrer or not using any OAuth user identification), or in any case as a fallback (where the live renderer cannot support the current charge, including for editing users, or if an authenticated user is using more than some threshold).
The libraries used in sommon framework should be able to detect and manage the switch to another cache, without failing and without mixing the two caches but I don't know if this is possible with existing versions of Leaflet or similar frameworks to setup them so they support alternate sources with distinct cache delays. The HTTP(S) protools allows setting cache properties but only per resource, and not globally from a source or domain, or for more specific paths (such as zoom levels): it would allow avoiding requery a "fast source" when it would reply that there's another preferable source to use globally for the same client or for a specific subpath.
This could be defined by sending documented cookies, or by adding a custom "service info" API to tileservers before using them to query actual tiles for a specific zoom level). May be that API could be also tuned to support selection of time servers per region (a bounding box of x/y tile positions: the API could return a list of bounding boxes, each one associated to the prefered source to query and its delay). But existing frameworks (Leaflet, and so on...) should support it (ideally it should be proposed and documented in the TMS protocol so that common application frameworks will support it by default)
Another footnote: opencyclemap (aka. thunderforest) started to watermark tiles requested without an API key actually informing the users that what they were asked to do. It would be possible to notify instead of blocking the undefined user agents about the problem and check the stats whether anything changes. (Watermarking seems to be a pretty good incentive to fix the problem since the map is usable but highly annoying.)
The usage of tile OSM tile infrastructure by the OpenStreetMap website was recently measured at approximately 11% of tile server output, so the vast majority of OSM's tiles are rendered to support 3rd party sites and apps. Although we want to support use of OpenStreetMap data, there are some use cases for which OSM and the surrounding ecosystem receives little or no benefit.
It has been suggested that OWG introduce two new rules to the tiles acceptable usage policy:
The technical implementation of this is a detail, so please keep discussion to the policy of whether or not we want to begin to restrict usage in this way.