Closed spwoodcock closed 1 month ago
For now this should be adapted from: https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18
More context on this after a lengthy internal discussion with team members.
refs
, which link nodes to ways:
<node id="298884269" lat="54.0901746" lon="12.2482632">
<way id="26659127">
<nd ref="292403538"/>
<nd ref="298884289"/>
<nd ref="261755686"/>
<nd ref="261728686"/>
<nd ref="292403538"/>
</way>
refs
entries for ways are removed. This leaves us with a collection of geometries (point, polyline, polygon), with attached osm_id and tags.refs
could be useful for easier correlation of geometries and generating a new OSM XML during conflation (they are essential if using JOSM to conflate), however, storing the refs
in raw-data-api would amount to 100's of GB of data stored, which is expensive in cloud databases. We need a workaround.osm_id
values for existing geometries, plus OSM tags that we converted from field mapping.Proposed workflow:
GET /api/0.6/map?bbox=left,bottom,right,top
endpoint to get all nodes, ways, relations for a bbox.For new geoms in the FMTM GeoJSON (ID in FMTM but not OSM, feature was added during field mapping), it is likely the field verified geometry will take precedence.
ST_Overlaps
in PostGIS for this.We need to call the API with: PUT /api/0.6/[node|way|relation]/create
with the required nodes and way.
Modified geoms (i.e. an OSM ID match, but differing geometry footprints) should be flagged, their percentage overlap determined, and then picked up in the frontend later for manual verification for which geometry will be kept.
version
tag is also higher than our current version
tag to verify the update.timestamp
and version
to help identify if this change was done after the FMTM project started, so if it's some sort of error.digitization_correct
field, but keep in mind that a 'field verified' geometry is simply a visual check, not measured with a measuring tape! An updated geometry in OSM may actually be more accurate if the digitiser used high resolution imagery.To update an existing geometry we need to call the API via PUT /api/0.6/[node|way|relation]/#id
to update existing nodes if necessary, and update the way with tags.
For the actual deletion we need to call the API: DELETE /api/0.6/[node|way|relation]/#id
to delete the way and all related nodes.
Needs further research!
<osm>
<node changeset="12" lat="..." lon="...">
<tag k="note" v="Just a node"/>
...
</node>
</osm>
<osm>
<node changeset="188021" id="4326396331" lat="50.4202102" lon="6.1211032" version="1" visible="true">
<tag k="foo" v="barzzz" />
</node>
</osm>
<osm>
<node id="..." version="..." changeset="..." lat="..." lon="..." />
</osm>
The above comment may change some of the API we have already designed unfortunately! Sorry @Sujanadh 🙏 Let's look into this together when we get a chance
I think you can skip the "Download extract from raw-data-api, including OSM IDs.", since in the next step you are doing the same thing, just differently. The conflation software converts the OSM XML internally to GeoJson anyway, and later converts the results back to OSM XML. A GeoJson file is generated too. OSM XML is important for JOSM. Otherwise if you edit the GeoJson file, at least in JOSM, you need to manually cut & paste the tags from the external dataset into the OSM layer. Creating a changeset is a good idea, and could eliminate the need for an OSM XML file and JOSM. It'd be easy to take the list of GeoJson features after validation and generate a changeset. I've basically been using JOSM as my UI.
I have seen that the external datasets for highways can be split at intersections, since the traced line may not understand where the surface/smoothness/name changes. But if you are ground-truthing, this is a common thing.
By "Download extract from raw-data-api" I mean the initial data extract download during project creation - we need the geometries to have something to map!
I think we need a conflation flow designed from the ground up to be independent of JOSM:
Sure, you can just use the GeoJson format if you're going to upload via changeset. Is this UI part of FMTM or a separate program ? JOSM is not antiquated, it is under active development, and many advanced mappers use use. And it can work fully offline, critical in the field. Not everything needs to be a website. Since the conflation code generates returns a list of GeoJson features, converting to OSM XML is separate, so you can ignore that if you want.
Hi Sam, great writeup. Adding a couple of questions below.
ID in FMTM but not OSM
implies no spatial join will be performed to ensure duplicate buildings are not added to OSM.Hi @charliemcgrady! Thanks so much for the input, it's really appreciated 🙏
[!NOTE] I should preface the answers to Q1 & Q2 by saying that the approach listed above is definitely a more naive (v1) approach.
Our main goal for FMTM (at least to start, based on user requirements) is to map regions that have typically been poorly mapped, mainly in developing countries. So as you say, this normally means much sparser geometries and an easier merge.
Once we nail down this simple conflation, we will move onto other conflation requirements based on user needs.
Although MultiPolygons are less common in data extracts I have seen in poorly mapped areas, we definitely need to handle this! At the start we could do a similar approach to Polygons, where we form a GeoJSON from the downloaded XML (using the relations as you say), then attempt to match the footprints of the FMTM MultiPolygon with the current OSM MultiPolygon.
This is an important question & a tricky one. When we start factoring in building types common in many developed city centres (terraces, blocks, etc), things get messy! We are definitely helped out by the sparseness of data we typically encounter. But this will 💯 have to be addressed with some more thought!
Quick update on this based on discussions with @kshitijrajsharma who architected raw-data-api!
Regarding point 2, on MultiPolygon (+ also MultiLineString).
type=multipolygon
or type=boundary
.ways
inside the relation
area, it's likely a multipolygon (requires additional processing though).route=x
to determine MultiLineStrings.osm_id
and do a footprint comparison as described above.I have seen buildings as MultiPolygons in OSM. In some countries there is a large courtyard in the middle, and the building wraps around it. So when conflating, I ignore the inner polygons. If a data extract for ODK Collect is used, then conflation is relatively easy as we have the OSM ID. Then it's just merging tags together. If there is a building in the basemap used for the location, or the GPS, but it's not in OSM, then it's a new feature. For FMTM, nobody is mapping building polygons with Collect that I'm aware of... so no spatial conflation is needed of the Polygons. Spatial conflation is more for building imports, not field mapping with ODK. Also ODK collected data is just a single node as well, so no Polygon to conflate with. Currently the conflation code supports conflating a single node with a nearby building for the cases where you aren't using a data extract, or there is a building in OSM with only "building=yes" from remote mapping.
Also don't forget conflating highways & waterways. When I tried a data extract of highways in Collect, I can still select it and answer the survey questions cause I don't need the geometry. Once again, just the tags. In a lot of remote areas the OSM feature only has "highway=track", and I want to add surface, smoothness, tracktype, and width to improve navigation.
Where conflation gets interesting is with external datasets not from ODK, so I don't think FMTM would be involved.
Btw, I've got a whole doc on conflating with ODK field collected data for more detail: https://hotosm.github.io/osm-merge/odkconflation/
I need to figure out why the images don't appear, but I just wrote a doc on conflating highways. While FMTM isn't mapping highways (yet), I believe it's on the roadmap. https://hotosm.github.io/osm-merge/highways/. Right now it's focused on remote US roads in national forests, but could easily be extended for other countries.
Is your feature request related to a problem? Please describe.
https://github.com/hotosm/conflator requires OSM data in a local postgres instance.Describe the solution you'd like
Option 1: host our own OSM database with updates (not ideal, wasted money), and create an endpoint.Option 2: integrate the conflation with raw-data-api, that already has an existing postgres instance with updated OSM data (i.e. a new endpoint).Discussing the long term solution with @kshitijrajsharma over the coming weeks. The most achievable option now is something like https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18
Process:
Additional considerations