osmlab / editor-layer-index

A unified layer index for OSM editors.
https://osmlab.github.io/editor-layer-index/
Other
217 stars 255 forks source link

How long should we keep outdated imagery in ELI #1446

Open rbuffat opened 2 years ago

rbuffat commented 2 years ago

Problem:

For some regions, there is new imagery available in regular intervals (e.g. each year). The question is, how long should we keep old imagery in ELI.

Keeping old imagery in ELI has some negative aspects:

But having older imagery has also some positive aspects:

Possible actions:

cicku commented 2 years ago

My opinion is no more than 2 imageries should be allowed for the same region in ELI (composite is an exception).

It would be better for iD to support reading from ESRI metadata or Bing or...to tell users more. If they cannot, then it is their "fault" to trick users using obsolete things.

If people wish to get something from 10 years ago here, they should dump to OpenHistoricalMap.

simonpoole commented 2 years ago

From a consumer of ELI POV such layers should be moved to the historicphoto category to make it clear that they have been superseded and shouldn't be shown except when the user is explicitly looking for them.

coolultra1 commented 2 years ago

I personally think, all imagery should be kept in ELI. This is an index of what exists, and not what is useful. As long as old imagery is classified historicphoto, there is no reason not to keep it.

rbuffat commented 2 years ago

there is no reason not to keep it.

Not true:

The size of the imagery.geojson is unnecessary increased if we keep not relevant imagery The maintenance work is increased Mappers could not be aware that they are using old imagery

simonpoole commented 2 years ago

The size of the imagery.geojson is unnecessary increased if we keep not relevant imagery

This is a general issue naturally.

We could de-duplicate the coverage geometries quite substantially if we are not too picky about them being exactly correct for the layer in question.

Or even more "radical" just have a bounding box (potentially more than one) in the standard distribution file and keep the detailed geometries separate.

A twist on that could be a tile based index of the available imagery ids, this would need to use variable sized tiles to be efficient, but that is not particularly difficult to do (https://github.com/simonpoole/mapsplit does that in an optional optimization pass).

tyrasd commented 2 years ago

I think there is no good reason to not include older layers in ELI.

Editors can always choose to show or not show the very old ones based on the provided dates (e.g. iD right now excludes everything known to be older than 20 years). FWIW, we even have some sources in ELI which contain much more antique data: e.g. USGS Topographic Maps is claimed to contain data from the 1950s.

But having older imagery has also some positive aspects: […]

From personal experience, I can add that having more than one or two available imagery layers also helps a lot when features are difficult to see, for example because of shadows, vegetation or occlusion from vehicles.

The size of the imagery.geojson is unnecessary increased if we keep not relevant imagery

This depends a lot from case to case: In cases like #1466 where the coverage polygons are pretty simple, the additional file size is almost negligible compared to the total size of the dataset. And in other situations different vintage versions of a region's imagery layer might be able to share the same polygon which would not significantly increase the gzipped size transferred to users (assuming we are mostly concerned about bandwidth here).

IMHO the size of the imagery.geojson will eventually even be an issue if we decide to exclude all historicphoto sources from the index. Sooner or later we need to think about a technical solution for this. I like the tile based idea proposed by @simonpoole above.

Mappers could not be aware that they are using old imagery

@rbuffat: Could you please explain why you think this will be the case? When older layer state their vintage in the name (e.g. Worms 2003) and are correctly categorized as historicphotos, this should not be an issue, should it?

The maintenance work is increased

This is a very valid point. Older layers might indeed be more often unstable (or removed from) WMS servers for example, causing potential additional work by maintainers. :thinking:

cicku commented 2 years ago

USGS Topographic Maps? Honestly it should be removed as well. The reason why it is there is unclear to me. USGS has digitalized all data and that old map is just off topic. You cannot use the data from 50 years ago and tell me “I won’t make any mistake”.

Editor can choose or not… Sorry, it is better to not let anyone choose here. You choose 2012 over 2020 it means you tend to introduce more issues. That’s inevitably true.

What we have these old imageries are more to be for QA. It might be good to split into 2 imagery files and 1 with only the latest and the other is for QA with old ones. However, adding 5 imageries for the same place/area/region/state/city/province is a no-no. None of you really gave a good reason above for doing this, the only reason I can see is for adding abandoned features, which is quite a controversial thing in OSM, but not OHM.

If there is no reason to not add them, then there is no reason to add as well.

tyrasd commented 2 years ago

USGS Topographic Maps? Honestly it should be removed as well.

I found it to be a pretty good source while mapping natural features for example. Of course one needs to take into account the vintage while using it.

USGS has digitalized all data and that old map is just off topic.

OK, if there is a more recent version of that source, let's replace it. I just wanted to make an anecdotal point that for some mapping tasks recency is not the only relevant quality criterion.

You choose 2012 over 2020

Wait, the question here is not whether to use 2012 over 2020, but rather to use 2012 together with 2020 over 2020 alone. I argue that by limiting the available information one would inevitably introduce more issues.

None of you really gave a good reason above for doing this

I beg to differ. To repeat what I already posted above: Having multiple sources can also be useful in cases where features are not clearly visible because of shadows, vegetation or occlusion by vehicles. I found it often useful to consult an older imagery layer when I couldn't clearly see where exactly a building ends when it is partially shadowed by itself or nearby buildings (which is quite often the case in densely built up areas). Another use case is when mapping turn lanes, as vehicles often block the visibility of painted turn arrows on the driving surface.

cicku commented 2 years ago

Sorry I’m on my phone and cannot format the reply well.

USGS (only government sources, no 3rd party):

  1. For water, NHD (markup)
  2. For road name, TIGER (USGS has a MapServer syncing with)
  3. USGS 3DEP (LiDAR)
  4. Wetland/Mountain/Lake/Reservoir/airport…these feature can either be mapped by eyeball or by recent imageries like USGS/NAIP.
  5. Please add if I miss.

For multiple sources regarding #1466 :

Having 1 or 2 old imageries is enough. Having 5 imageries is not needed.

imagico commented 2 years ago

Editors can always choose to show or not show the very old ones based on the provided dates (e.g. iD right now excludes everything known to be older than 20 years).

At the risk of sounding like a broken record: The problem is not layers with explicitly documented age of imagery being very old, the problem is the global aggregate layers combining different imagery of very different age without documentation of that and those being advertised by editors as the default without any warning of that including images older than 20 years.

For Bing editors could easily show a warning for all tiles with capture date metadata 1/1/1999-*, for the other layers it could be prudent to maintain exclusion polygons that document where those layers should not be used (which for all of the global layers - Bing/Maxar/Esri/... amounts to between about 10 and 25 percent of the land surfaces probably).

Regarding old legacy map layers - keep in mind that the digital data of mapping agencies advertised these days as the latest and most up-to-date and modern stuff typically contains lots of information that was surveyed 50+ years ago and not substantially reviewed or updated since then. Like with photos it is often better to have the original sources of such data available to mappers rather than some aggregate layer intransparently combining recent data with 50+ years information mechanically digitized recently without any possibility for the mapper to know which information is recent and which is not.

kallejre commented 2 years ago

I was thinking of reply for quite some time and main points are already covered by previous comments made in past hour. I think Editor part of ELI's name can be optional. I'm not really deeply informed about GIS topics, but i see ELI's value far beyond OSM ecosystem. Think of it as global open dataset about aerial imagery or maps coming from various different sources.

Mappers are constantly reminded not to tag for renderer. Could similar mindset be introduced to imagery? Don't prohibit layers because some user of some software may use outdated imagery while newer layer is available. That's something editor should do. What if OHM developers would like to integrate ELI (if they already haven't)?

I agree with suggestions made by simonpoole and imagico about either simplifying geojson polygons, deduplication and/or adding exception regions where certain imagery could be discouraged (such example could be #691). On the other hand, deduplication would probably break data consumers parsing ELI already, maybe old version should be kept online?

It would be better for iD to support reading from ESRI metadata or Bing or...to tell users more. If they cannot, then it is their "fault" to trick users using obsolete things.

Could that metadata querying be integrated into ELI for other editors to use?

cicku commented 2 years ago

@kallejre It cannot be optional. We review imageries that may or may not be compatible with OSM. Dropping it means not only a larger list will be introduced, but also the list can no longer be safely used for OSM editors.

simonpoole commented 2 years ago

@cicku

@kallejre It cannot be optional. We review imageries that may or may not be compatible with OSM. Dropping it means not only a larger list will be introduced, but also the list can no longer be safely used for OSM editors.

This is not really an issue, and in the past we actually maintained licence information in ELI that made it possible for use outside of the narrow application to OSM editors to be safe (unluckily this was removed for unclear reasons), The thing is that we have in general more relaxed terms for OSM editor use than for non-editor use, not the other way around.

For example you can't use any of the global mosaics in an non-editing context without getting a permission/account/... with Bing, ESRI, Mapbox and so on, but use of the same sources in an OSM editor is fine.

There are probably some exceptions somewhere, where there is a source available on open terms for which we can't negotiate use in OSM, but I would suggest that that is a bridge we can cross when necessary.

cicku commented 2 years ago

Nothing is an issue until the list is hard to maintain. That’s why it was dropped.

Today I can approve PR of 5 imageries, tomorrow someone can bring 10 imageries together to us asking for the same. State where I live (Delaware of US) has aerial captured from 1926 to 1997, looks like we should include them all together. Reason to do that? Does not matter.

If this really happens, I’m afraid I cannot review anymore for this project.

andrewharvey commented 2 years ago

In my opinion there should be no limit we should continue to include old imagery so long as the licensing is still valid and the URLs are accessible.

As has been pointed out older imagery is useful as:

I think the maintenance burden is not too great as for the most part we don't need to touch these, unless they flag as broken then we can just remove.

The size of imagery.geojson would only be an issue for editor software using this directly rather having their own API which provides imagery at the map area only. Editor software could still choose to filter out old imagery from imagery.geojson where newer imagery exists based on the date and coverage area.

Editors could built in better alters or warnings when users activate old imagery.

gdt commented 2 years ago

I think it makes sense to keep imagery until the local mappers think it is pointless. In Massachusetts, we have 2019 and 2021, plus older, and 2019 is highly useful, as tree occulsion or zenith angle is often better (often not). Particularly if one can get a rough sense that not much has changed from either the 2021 or actually having knowledge of construction or not, I end up using the 2019 to trace a non-negigible amount of time. If we had 15 cm 2017 imagery I might or might not want that, and 2015 is probably not useful. So I'd say as a general rule allow 3 dates from any provider, unless the oldest is so old that it is the consensus of local mappers that nobody would ever want to look at it.

I also find the USGS topo maps useful enough to keep. It really doesn't seem like the metadata about it is taking up scarce space. I am far from the city and many things have no changed on the ground.