qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
116 stars 37 forks source link

QGIS Enhancement Proposal: Mitigate Abusive Tile Fetching on OpenStreetMap (OSM) Servers #291

Open nirvn opened 3 months ago

nirvn commented 3 months ago

QGIS Enhancement: QGIS Enhancement Proposal: Mitigate Abusive Tile Fetching on OpenStreetMap (OSM) Servers

Date 2024/03/14

Author Mathieu Pellerin (@nirvn)

Contact mathieu@opengis.ch

Version QGIS 3.X

Current Situation

The OpenStreetMap (OSM) Foundation has expressed concerns with QGIS regarding the escalating tile usage, primarily attributable to a minority of users engaging in mass downloading of tiles. Detailed statistics on this issue are available at https://github.com/openstreetmap/operations/issues/1019. It is imperative for QGIS to address this matter to alleviate strain on OSM's limited resources, ensuring the continued availability of OSM layers as default options within QGIS.

Summary

This proposal aims to enhance QGIS's handling of tile fetching on the OpenStreetMap (OSM) servers to mitigate abusive tile fetching practices. The key objective is to prevent excessive strain on OSM servers by implementing measures to reduce fetching of tiles during ‘normal’ usage of the OSM tile server as well as discourage mass downloading of tiles through QGIS desktop.

Implementation Details

  1. Research and Analysis: Conduct thorough research into current tile fetching practices within QGIS and analyze the impact on OSM servers. Collaborate with the OSM Foundation to understand their concerns and gather insights for implementing effective mitigation strategies.
  2. Documentation and Communication: Provide clear documentation on the updated network cache size algorithms and algorithm changes to prohibit abusive tile fetching. Communicate these changes effectively to QGIS users through release notes and algorithm documentation updates.

Proposed Solution

  1. Optimized Default Network Cache Size: Develop an improved logic to define the default network cache size in QGIS, considering available space on users' systems. This optimization will help avoid small cache sizes when ample space is available, thereby reducing the frequency of tile requests to OSM servers.
  2. Processing Algorithm Changes: Modify relevant processing algorithms within QGIS to incorporate safeguards against mass downloading of tiles from the official OSM server (i.e. https://tile.openstreetmap.org/) . These changes will include rate-limiting mechanisms and detection of large-scale downloading patterns to prevent abuse of OSM server resources.

Risks

Low

haubourg commented 3 months ago

+1 as discussed in PSC, we need to lower our impact on OSM servers.

wonder-sk commented 3 months ago

Maybe we could also add the {usage} term in the default OSM connection, so it is easier to track when someone is downloading tiles rather using tiles for viewing - see https://github.com/qgis/QGIS/pull/46731

Agreed we should increase the cache size - probably to at least 1 GB, given that more and more data sources are being streamed from remote servers...

pathmapper commented 3 months ago

:+1:

Maybe https://github.com/qgis/QGIS/issues/56197 is of interest, currently the cache could grow up to the double amount of what is defined as size in network settings.

rouault commented 3 months ago

This proposal goes in the right direction, but while QGIS may help the OSM team to better identify the origin of requests, it seems to me that the ultimate solution is on the server side. Especially against abusive mass download of tiles (which seems the main issue OSM admins face). A sufficiently determined QGIS user may just remove any client-side rate-limiting we might add... Or they could just use a trivial shell script to mass download OSM tiles outside of QGIS.

I can imagine that when a tool starts downloading tiles some unique identifier of the "session" could help the server. But isn't the IP address of the client a sufficient enough information for OSM servers to already rate-limit an abuser?

nyalldawson commented 3 months ago

One area I feel we can definitely improve is handling xyz tiles when the project is not in web mercator. In this scenario we fetch tiles at too high a zoom level, and end up requesting many more tiles then we need. I suspect this is one major contributor to our tile usage.

And in this scenario, the layer rendering will always be degraded anyway, so fetching lower res tiles shouldn't be a noticeable regression...

nyalldawson commented 3 months ago

But anyway, big +1 to this, and taking steps to improve the relationship with OSM. That's something we can't afford to harm!

grischard commented 3 months ago

This proposal goes in the right direction, but while QGIS may help the OSM team to better identify the origin of requests, it seems to me that the ultimate solution is on the server side.

Hi from OpenStreetMap! We do limit abusive downloads on the server side, including from people masquerading as QGIS. This is many legitimate users inadvertently hitting our servers a bit too hard, and we do not want to block all of QGIS.

anitagraser commented 3 months ago

Thank you for submitting your proposal to the 2024 QGIS Grant Programme. The 2 week discussion period starts today. At the end of the discussion, the proposal author has to provide a 3-line pitch of their proposal for the voter information material. (For an example from last year check https://github.com/qgis/PSC/issues/58#issuecomment-1567892412)

rduivenvoorde commented 2 months ago

Don't want to be a PITA, but on normal use of an OSM layer, we are still requesting all tiles of all 8 mapcanvas-extents around the current mapcanvas. I think (given the giant screens I see QGIS running on nowadays), NOT requesting all these (for OSM) would probably help already a lot. I have always found it a little opportunistic to do all these requests (given we do not run our own tile servers)...

I understand the user-experience will not be better, but I really want to keep OSM in QGIS.

Or else: what about giving the 'normal'-requests (from QGIS/MapCanvas) another User-Agent then the requests from processing/tile-downloader? Then at least OSM can determine the bottleneck better?

See: https://github.com/qgis/QGIS/pull/41832 and https://github.com/qgis/QGIS/pull/41953

Proposed/implemented it there earlier, but got veto'd/overthrown apparently :-)

I think client software should be more polite against servers ran by others.

Marwe commented 2 months ago

Print output (e.g. for atlas with many pages) and rendering/storage for offline usage come to my mind. Such operations may trigger mass downloads, heavily depending on the resolution settings. For WMS you usually get a notification, maybe there are places for hooks?

ianthetechie commented 2 months ago

I saw a comment about this on Mastodon and was encouraged to drop some thoughts over here 😀

@rduivenvoorde hit one of the things I'm pretty sure I've hit before. I haven't gotten to the level of digging into the detail he has, but I notice that it loads a lot more tiles than are necessary in many cases.

Another thing I think is going on is not cancelling queued in flight requests that aren't necessary anymore. This could actually be the above.

Both of these would actually improve the overall responsiveness of QGIS (which is always slower than web based maps, presumably due some mix of poor queuing logic and over-fetching), in addition to being lighter on tile servers like OSM.