organicmaps / organicmaps

🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists. It uses crowd-sourced OpenStreetMap data and is developed with love by MapsWithMe (MapsMe) founders and our community. No ads, no tracking, no data collection, no crapware. Please donate to support the development!
https://organicmaps.app
Apache License 2.0
9.45k stars 913 forks source link

Download Wikipedia articles' summaries only / download separately from map files #5912

Open pastk opened 12 months ago

pastk commented 12 months ago

"Summary" is an article's part before any sections like "History", "See also", etc.

Some examples of map size inflation: Paris - from 18MB to 65MB Moscow - from 51MB to 92MB Buenos-Aires - from 39MB to 70MB Mexico City - 148MB to 180MB (these numbers are outdated, but you get the picture)

The big point is: Users download and keep map files for the sake of the map data, not for the auxiliary function of wikipedia offline reading! Especially when they can't understand many languages wiki articles are stored in - it just becomes waste of space and bandwidth. IMHO compact map files is one of the notable advantages of OM.

So any significant map size increase should be considered very carefully weighing in the value most of the users will get from it.

I think some questions were not answered before rolling out of the wikipedia feature:

It might happen that sizes will be significantly inflated even if we limit to summaries only but include all OM languages. I don't think it'll be fair to keep the feature limited to a few languages only like it is now - in this case users who don't understand these languages will be at big disadvantage (no value added for them, but need to cope with bigger files anyways) - and this is users from mostly third-world countries who don't posses modern devices with lots of storage and cheap traffic.

Originally posted by @pastk in https://github.com/organicmaps/organicmaps/issues/2410#issuecomment-1100863115

pastk commented 12 months ago

An even better solution in the long term would be to make wiki articles downloadable as optional separate files.

biodranik commented 12 months ago

Exactly. Downloading wiki articles separately, in only a needed language, will save us a lot of traffic on map updates, while keeping wiki articles as detailed as possible.

biodranik commented 10 months ago
  1. What's the wiki section format in mwm?
  2. Are articles embedded into features directly or are they stored separately?
  3. How features and articles are connected?
  4. How to disconnect them to download separately, but make it easier to highlight features with wiki articles e.g. using search?
  5. How to use an article size to rank features?
  6. What would be the best and convenient UX to download wiki articles separately and only in the needed language(s)?
jumelles commented 10 months ago

6. What would be the best and convenient UX to download wiki articles separately and only in the needed language(s)?

There will absolutely be places in non-English-speaking countries that have a Wikipedia article in that language but none in English. Ideally the OSM data will link to Wikipedia via Wikidata, which should be vastly more comprehensive than just the English Wikipedia article pagespace.

Here's a quick example I found: Église Saint-Jacques-le-Majeur d'Asquins

biodranik commented 10 months ago

@jumelles how is that relevant for a user if he/she doesn't read that local language and for our discussion? Each feature still has a wiki link to check it online if necessary.

jumelles commented 10 months ago

I'm just pointing out to be sure to query Wikidata instead of Wikipedia.

biodranik commented 10 months ago

@jumelles Wikidata should be already queried. Can you please check if it already works in Organic Maps? And let's focus on the issue's topic.

newsch commented 5 months ago
  1. What's the wiki section format in mwm?
  2. Are articles embedded into features directly or are they stored separately?
  3. How features and articles are connected?

It's the descriptions section, briefly described in descriptions/serdes.hpp

The section stores mappings of feature_index -> (lang, description).

The contents are:

So to get the description for a feature in a given language:

  1. Binary search for the feature index in FeaturesIndices.
  2. Get LangMeta offset in LangMetaIndex w/ same offset as feature index.
  3. Get LangMetaIndex entry with offset.
  4. Find description index for language in entry.
  5. Get description index from Strings.
  1. How to disconnect them to download separately

In the current form, disconnecting is straightforward: move the section to a separate file. It is tied to the layout of features in the main file however, so they must be updated together.

  1. ...make it easier to highlight features with wiki articles e.g. using search?
  2. How to use an article size to rank features?

With the current format you'd need to do all of the same lookups to get article size. Detecting presence means searching only FeaturesIndices if you don't care about language, or LangMetaIndex and LangMeta if you do.

Does search build its own index for ranking, or does it use the files directly?

biodranik commented 5 months ago

There is a separate index for everything, including search )

biodranik commented 1 month ago

CC @newsch