rinigus / osmscout-server

Maps server providing tiles, geocoder, and router
https://rinigus.github.io/osmscout-server
GNU General Public License v3.0
160 stars 27 forks source link

[Feature request] Implement ability to globally subscribe to segmented countries #296

Open zwieberl opened 5 years ago

zwieberl commented 5 years ago

Some countries, like Germany and France, can't be subscribed to as a whole, but only with a lot of smaller sub-maps (> 20 per country). UK seems to have both, "Great Britain" as well as smaller sub-regions like Scotland and Wales. It would be good to be able to subscribe to the complete country for all countries and, or if available, only to some sub-regions (also to quickly unsubscribe from countries currently not needed).

rinigus commented 5 years ago

Its mainly due to file size limitations of SD cards with FAT-type file systems. Mapnik database for Germany and France was more than 4GB and I was constantly getting complaints about it. However, maybe I should figure out how to make it possible without large Mapnik database.

zwieberl commented 5 years ago

Maybe some sort of wrapper script might be possible. To have all those smaller databases, but have an automatism that simply un-/subscribes from all of them. Although, "Use only database X for searching" should also work.

rinigus commented 5 years ago

it will have to be implemented with the maps distribution, maybe script on that side...

zwieberl commented 5 years ago

I have a follow-up question to this, just to show my complete ignorance with respect to maps-data: For those countries that do offer both (smaller sub-maps and a big combined one), like England, there are:

England 2.318 MB
Berkshire 682 MB
Buckinghamshire 735 MB
Cambridgeshire 975 MB
Cheshire 768 MB
Cornwall 494 MB
....

Meaning: The sub-maps exceed the combined map with respect to size by far. How is that? Is the overhead per sub-map so large? Or does the combined map contain less details?

Additionally: If one downloads both the combined map and the sub-levels, is this actually going to take diskspace for both, or is OSMscout-server filtering the data that is already there?

rinigus commented 5 years ago

The displayed sizes are "worst case" scenario. The way it ends up on your storage would depend on several factors, such as selected regions and backends.

For Mapbox GL and Valhalla: all world is divided into tiles, with the given resolution and sometimes grouped together into packages. Each region has a list of such packages, for every package that even a little bit overlaps with the region. When downloading, the server will check whether the package has been downloaded already and will fetch it only if needed. So, England and all sub-regions, would in total use the same amount of space and you can even download the overlapping ones (like England + Berkshire) and will end up with the same datasets. To display accurate storage requirements, I would have to implement more sophisticated approach and it may still end up confusing for the users.

For libpostal, distributed as a part of GeocoderNLP: datasets are per country (UK, USA, France). These datasets are for parsing addresses in some country and require extensive computations for making them. Since they are relatively small, I kept them with this gradation.

For Geocoder-NLP, Mapnik, and libosmscout: each dataset will correspond to the selected region, as accurately as possible. If you have England and Berkshire, you will end up having two copies of the same overlapping data. In search, for example, you will get double hits.

So, to be safe, choose some gradation and stick to it.If you go for sub-regions, then choose them and don't mix with overlapping regions. However, selection of Berkshire and Scotland should be fine.

Olf0 commented 5 years ago

Just my 2 cents, after reading this. I see two easy approaches:

  1. To have a little internal script subscribing to all the sub-regions when a segmented country is being selected.
  2. To additionally offer unsegmented versions of these country maps with a textual warning (e.g., "( > 4 GB!)") automatically added to all files larger than 4 GBytes.

Technically and personally I would prefer option 2, but option 1 may result in less complaints by users not understanding the technical details.