osm-fr / osmose-backend

Part of osmose that runs the analysis, and send the results to the frontend.
GNU General Public License v3.0
92 stars 115 forks source link

Duplicate error in france_bretagne_morbihan : analyser_osmosis_building_overlaps #2091

Closed Famlam closed 10 months ago

Famlam commented 10 months ago

From https://buildbot.osmose.openstreetmap.fr/#/builders/649/builds/1190/steps/3/logs/stdio :

2023-11-29 02:22:01 france_bretagne_morbihan : osmosis_building_overlaps
2023-11-29 02:22:02   osmosis resume post scripts
2023-11-29 02:22:05   run osmosis change analyser Analyser_Osmosis_Building_Overlaps
2023-11-29 02:22:05   requires table touched_buildings
2023-11-29 02:22:05   requires table not_touched_buildings
2023-11-29 02:22:05   analyser_osmosis_building_overlaps.py:217 sql
2023-11-29 02:22:08   analyser_osmosis_building_overlaps.py:218 sql
2023-11-29 02:23:27   analyser_osmosis_building_overlaps.py:220 sql
2023-11-29 02:23:27   analyser_osmosis_building_overlaps.py:221 sql
2023-11-29 02:23:28   analyser_osmosis_building_overlaps.py:222 xml generation
2023-11-29 02:23:28   analyser_osmosis_building_overlaps.py:223 xml generation
2023-11-29 02:23:28   analyser_osmosis_building_overlaps.py:224 xml generation
2023-11-29 02:23:28   analyser_osmosis_building_overlaps.py:225 xml generation
2023-11-29 02:23:28   analyser_osmosis_building_overlaps.py:226 xml generation
2023-11-29 02:24:22   analyser_osmosis_building_overlaps.py:229 xml generation
2023-11-29 02:24:23   analyser_osmosis_building_overlaps.py:230 xml generation
2023-11-29 02:24:23   update
2023-11-29 02:24:23     iteration=1
2023-11-29 02:24:24       error: UPDATE ERROR france_bretagne_morbihan/Osmosis_Building_Overlaps : {"detail":"Traceback (most recent call last):\n  File \"/data/project/osmose/frontend/control/update.py\", line 77, in user\n    await update_utils.update(db, source_id, f.name, remote_ip=remote_ip)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 90, in update\n    tasks[1].result()\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 72, in async_parser_task\n    await async_update_parser(source_id, fname, remote_ip, db).parse(q)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 495, in parse\n    await self.endElement(*args)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 627, in endElement\n    await table_merge_markers_tmp(self._db, self.all_uuid)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 405, in table_merge_markers_tmp\n    await _db.execute(sql_marker)\n  File \"/data/project/osmose/frontend/osmose-frontend-venv/lib/python3.9/site-packages/asyncpg/connection.py\", line 317, in execute\n    return await self._protocol.query(query, timeout)\n  File \"asyncpg/protocol/protocol.pyx\", line 338, in query\nasyncpg.exceptions.CardinalityViolationError: ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."}
2023-11-29 02:24:39     iteration=2
2023-11-29 02:24:39       error: UPDATE ERROR france_bretagne_morbihan/Osmosis_Building_Overlaps : {"detail":"Traceback (most recent call last):\n  File \"/data/project/osmose/frontend/control/update.py\", line 77, in user\n    await update_utils.update(db, source_id, f.name, remote_ip=remote_ip)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 90, in update\n    tasks[1].result()\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 72, in async_parser_task\n    await async_update_parser(source_id, fname, remote_ip, db).parse(q)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 495, in parse\n    await self.endElement(*args)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 627, in endElement\n    await table_merge_markers_tmp(self._db, self.all_uuid)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 405, in table_merge_markers_tmp\n    await _db.execute(sql_marker)\n  File \"/data/project/osmose/frontend/osmose-frontend-venv/lib/python3.9/site-packages/asyncpg/connection.py\", line 317, in execute\n    return await self._protocol.query(query, timeout)\n  File \"asyncpg/protocol/protocol.pyx\", line 338, in query\nasyncpg.exceptions.CardinalityViolationError: ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."}
2023-11-29 02:25:09     iteration=3
2023-11-29 02:25:09       error: UPDATE ERROR france_bretagne_morbihan/Osmosis_Building_Overlaps : {"detail":"Traceback (most recent call last):\n  File \"/data/project/osmose/frontend/control/update.py\", line 77, in user\n    await update_utils.update(db, source_id, f.name, remote_ip=remote_ip)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 90, in update\n    tasks[1].result()\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 72, in async_parser_task\n    await async_update_parser(source_id, fname, remote_ip, db).parse(q)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 495, in parse\n    await self.endElement(*args)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 627, in endElement\n    await table_merge_markers_tmp(self._db, self.all_uuid)\n  File \"/data/project/osmose/frontend/control/update_utils.py\", line 405, in table_merge_markers_tmp\n    await _db.execute(sql_marker)\n  File \"/data/project/osmose/frontend/osmose-frontend-venv/lib/python3.9/site-packages/asyncpg/connection.py\", line 317, in execute\n    return await self._protocol.query(query, timeout)\n  File \"asyncpg/protocol/protocol.pyx\", line 338, in query\nasyncpg.exceptions.CardinalityViolationError: ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."}

The issue only occurs in diff mode.

It seems that this analyser runs sql70 in diff mode with: run1: b1 = touched_buildings vs b2 = buildings run2: b1 = not_touched_buildings vs b2 = touched_buildings Here, b2.id is the output

I suspect this may fail if there's three buildings: A, B, C, where all intersect "A" (and A is a triangular building). If "A" and "C" gets touched, then run1: b1 = A,C, b2 = A,B,C -> output: A (due to intersection A-C) run2: b1 = B, b2 = A,C -> output A (due to intersection A-B) Giving twice A as output

If so, the solution is probably to also include the second way (b1) in the output too? The downside is that the second way (b1) would constantly change depending on which building was touched, due to the DISTINCT ON. Alternatively, diff mode should be removed.

P.s. I'm not entirely sure what this analyser does. It seems that it searches for triangular buildings that are touching other buildings, but I'm not entirely sure what's wrong with that? Or is it a common import error in France? (Or a forbidden building shape)

frodrigo commented 10 months ago

I think DISTINCT ON is the right way to do it. But DISTINCT ON should never be used without ORDER BY like it the case here.

Famlam commented 10 months ago

But 'DISTINCT ON' wouldn't give constant results either, even with 'ORDER BY', in diff mode, right? Should we remove diff mode? Or add 'b1.id' to the output

frodrigo commented 10 months ago

I prefer keep the diff mode when possible. So lets try to add ORDER BY + b1.id.

Famlam commented 10 months ago

It seems the original issue resolved itself (maybe a full run was triggered, or the bad ways have been modified/removed since, such that it no longer triggers the error)

I'll keep the issue open as it should be a very simple fix. (Although I still don't know why triangular buildings are considered bad in France 🤔)

frodrigo commented 10 months ago

Although I still don't know why triangular buildings are considered bad in France 🤔)

Building come from french Cadaster and the building are split by land plot and should be merged before upload. And small triangle can be spoted when it is not the case.

Famlam commented 10 months ago

Building come from french Cadaster and the building are split by land plot and should be merged before upload. And small triangle can be spoted when it is not the case.

Thanks! As this analyser has a 1/3th false positive rate (5695 false positives vs 12394 open issues) I also looked at some false positives. A split by a land plot would mean that at least 2 adjacent points should be connected. If only one point is connected, it cannot be a split by land plots. Hence I also added a dimension check for the overlap. This solves a lot of the false positives.

This fixes issues like