hotosm / fmtm

Field Mapping Tasking Manager - coordinated field mapping.
https://fmtm.hotosm.org/
GNU Affero General Public License v3.0
44 stars 44 forks source link

Create endpoint for conflation of FMTM data with existing OSM data #1548

Closed spwoodcock closed 1 month ago

spwoodcock commented 4 months ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Discussing the long term solution with @kshitijrajsharma over the coming weeks. The most achievable option now is something like https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18

Process:

Additional considerations

spwoodcock commented 3 months ago

For now this should be adapted from: https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18

spwoodcock commented 2 months ago

More context on this after a lengthy internal discussion with team members.

OSM Refs & Raw-Data-API

GeoJSON Conflation Workflow (OSM specific)

Proposed workflow:

New Geoms

For new geoms in the FMTM GeoJSON (ID in FMTM but not OSM, feature was added during field mapping), it is likely the field verified geometry will take precedence.

We need to call the API with: PUT /api/0.6/[node|way|relation]/create with the required nodes and way.

Modified Geoms

Modified geoms (i.e. an OSM ID match, but differing geometry footprints) should be flagged, their percentage overlap determined, and then picked up in the frontend later for manual verification for which geometry will be kept.

To update an existing geometry we need to call the API via PUT /api/0.6/[node|way|relation]/#id to update existing nodes if necessary, and update the way with tags.

Deleted Geoms

For the actual deletion we need to call the API: DELETE /api/0.6/[node|way|relation]/#id to delete the way and all related nodes.

Needs further research!

Updating OSM via the API

spwoodcock commented 2 months ago

The above comment may change some of the API we have already designed unfortunately! Sorry @Sujanadh 🙏 Let's look into this together when we get a chance

rsavoye commented 2 months ago

I think you can skip the "Download extract from raw-data-api, including OSM IDs.", since in the next step you are doing the same thing, just differently. The conflation software converts the OSM XML internally to GeoJson anyway, and later converts the results back to OSM XML. A GeoJson file is generated too. OSM XML is important for JOSM. Otherwise if you edit the GeoJson file, at least in JOSM, you need to manually cut & paste the tags from the external dataset into the OSM layer. Creating a changeset is a good idea, and could eliminate the need for an OSM XML file and JOSM. It'd be easy to take the list of GeoJson features after validation and generate a changeset. I've basically been using JOSM as my UI.

I have seen that the external datasets for highways can be split at intersections, since the traced line may not understand where the surface/smoothness/name changes. But if you are ground-truthing, this is a common thing.

spwoodcock commented 2 months ago

By "Download extract from raw-data-api" I mean the initial data extract download during project creation - we need the geometries to have something to map!

I think we need a conflation flow designed from the ground up to be independent of JOSM:

rsavoye commented 2 months ago

Sure, you can just use the GeoJson format if you're going to upload via changeset. Is this UI part of FMTM or a separate program ? JOSM is not antiquated, it is under active development, and many advanced mappers use use. And it can work fully offline, critical in the field. Not everything needs to be a website. Since the conflation code generates returns a list of GeoJson features, converting to OSM XML is separate, so you can ignore that if you want.

charliemcgrady commented 2 months ago

Hi Sam, great writeup. Adding a couple of questions below.

  1. For new geometries, will there be a step which ensures an OSM building has not been added since the FMTM building was collected? ID in FMTM but not OSM implies no spatial join will be performed to ensure duplicate buildings are not added to OSM.
  2. Will multi-polygons be supported for addition/modification/deletion? In this case, the API will need to handle building relations as well.
  3. Are there plans to handle more complex merge conflicts during manual validation? For rural areas, it's likely sufficient to either take the OSM or the FMTM buildings, as the conflicts will be more isolated. However, denser regions where buildings are either connected or close to each other may lead to more complex conflicts which require a more sophisticated merge process. Screenshot 2024-08-05 at 7 58 54 PM
spwoodcock commented 2 months ago

Hi @charliemcgrady! Thanks so much for the input, it's really appreciated 🙏

  1. Excellent point - I have updated to add the check for new geometries here.

[!NOTE] I should preface the answers to Q1 & Q2 by saying that the approach listed above is definitely a more naive (v1) approach.

Our main goal for FMTM (at least to start, based on user requirements) is to map regions that have typically been poorly mapped, mainly in developing countries. So as you say, this normally means much sparser geometries and an easier merge.

Once we nail down this simple conflation, we will move onto other conflation requirements based on user needs.

  1. Although MultiPolygons are less common in data extracts I have seen in poorly mapped areas, we definitely need to handle this! At the start we could do a similar approach to Polygons, where we form a GeoJSON from the downloaded XML (using the relations as you say), then attempt to match the footprints of the FMTM MultiPolygon with the current OSM MultiPolygon.

  2. This is an important question & a tricky one. When we start factoring in building types common in many developed city centres (terraces, blocks, etc), things get messy! We are definitely helped out by the sparseness of data we typically encounter. But this will 💯 have to be addressed with some more thought!

spwoodcock commented 2 months ago

Quick update on this based on discussions with @kshitijrajsharma who architected raw-data-api!

Regarding point 2, on MultiPolygon (+ also MultiLineString).

Relations / Multi-Geoms in OSM (background info)

How raw-data-api handles multi-geoms

How to use this info during conflation

rsavoye commented 2 months ago

I have seen buildings as MultiPolygons in OSM. In some countries there is a large courtyard in the middle, and the building wraps around it. So when conflating, I ignore the inner polygons. If a data extract for ODK Collect is used, then conflation is relatively easy as we have the OSM ID. Then it's just merging tags together. If there is a building in the basemap used for the location, or the GPS, but it's not in OSM, then it's a new feature. For FMTM, nobody is mapping building polygons with Collect that I'm aware of... so no spatial conflation is needed of the Polygons. Spatial conflation is more for building imports, not field mapping with ODK. Also ODK collected data is just a single node as well, so no Polygon to conflate with. Currently the conflation code supports conflating a single node with a nearby building for the cases where you aren't using a data extract, or there is a building in OSM with only "building=yes" from remote mapping.

Also don't forget conflating highways & waterways. When I tried a data extract of highways in Collect, I can still select it and answer the survey questions cause I don't need the geometry. Once again, just the tags. In a lot of remote areas the OSM feature only has "highway=track", and I want to add surface, smoothness, tracktype, and width to improve navigation.

Where conflation gets interesting is with external datasets not from ODK, so I don't think FMTM would be involved.

rsavoye commented 2 months ago

Btw, I've got a whole doc on conflating with ODK field collected data for more detail: https://hotosm.github.io/osm-merge/odkconflation/

rsavoye commented 1 month ago

I need to figure out why the images don't appear, but I just wrote a doc on conflating highways. While FMTM isn't mapping highways (yet), I believe it's on the roadmap. https://hotosm.github.io/osm-merge/highways/. Right now it's focused on remote US roads in national forests, but could easily be extended for other countries.