larsop / resolve-overlap-and-gap

This code is moving to https://gitlab.com/nibioopensource/resolve-overlap-and-gap. The plan is to use Postgis Topology to resolve overlaps and gaps for a simple feature layers. Norwegian Institute of Bioeconomy Research (NIBIO)
GNU General Public License v3.0
3 stars 1 forks source link

Missing index on edge_data(abs_next_left_edge) and edge_data(abs_next_right_edge) #21

Closed larsop closed 2 years ago

larsop commented 2 years ago

When clean up a simple feature with 17296735 in edge_data we find performance issue


-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
total     | 0.013054670450000036
avg       | 0.12160848113646984
calls     | 6441
substring | UPDATE "test_ar5_2022_01".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE edge_id = $3
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
total     | 287.98153985746666
avg       | 21046.153948170522
calls     | 821
avg       | 21046.153948170522
calls     | 821
substring | SELECT $2 FROM ONLY "test_ar5_2022_01"."edge_data" x WHERE $1 OPERATOR(pg_catalog.=) "abs_next_left_edge" FOR KEY SHARE OF x
-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
total     | 0.006974725233333313
avg       | 0.060773092361312626
calls     | 6886
substring | UPDATE "test_ar5_2022_01".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE end_node= $3 AND next_left_edge= $4 AND abs_next_left_edge= $5 AND edge_id != $6
-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
total     | 337.3503410906494
avg       | 23563.46969201276
calls     | 859
substring | UPDATE "test_ar5_2022_01".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE next_left_edge= $3 AND abs_next_left_edge= $4
-[ RECORD 5 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
total     | 311.0796148374833
avg       | 21855.71064431967
calls     | 854
substring | UPDATE "test_ar5_2022_01".edge_data SET next_right_edge= $1, abs_next_right_edge= $2 WHERE next_right_edge= $3 AND abs_next_right_edge= $4
-[ RECORD 6 ]--

Will try to add these indexes

Unlogged table "test_ar5_2022_01.edge_data"
       Column        |           Type            | Collation | Nullable |                           Default                           
---------------------+---------------------------+-----------+----------+-------------------------------------------------------------
 edge_id             | integer                   |           | not null | nextval('test_ar5_2022_01.edge_data_edge_id_seq'::regclass)
 start_node          | integer                   |           | not null | 
 end_node            | integer                   |           | not null | 
 next_left_edge      | integer                   |           | not null | 
 abs_next_left_edge  | integer                   |           | not null | 
 next_right_edge     | integer                   |           | not null | 
 abs_next_right_edge | integer                   |           | not null | 
 left_face           | integer                   |           | not null | 
 right_face          | integer                   |           | not null | 
 geom                | geometry(LineString,4258) |           |          | 
Indexes:
    "edge_data_pkey" PRIMARY KEY, btree (edge_id)
    "edge_data_geom_idx" gist (geom)
    "edge_end_node_idx" btree (end_node)
    "edge_gist" gist (geom)
    "edge_left_face_idx" btree (left_face)
    "edge_right_face_idx" btree (right_face)
    "edge_start_node_idx" btree (start_node)
Foreign-key constraints:
    "end_node_exists" FOREIGN KEY (end_node) REFERENCES test_ar5_2022_01.node(node_id)
    "left_face_exists" FOREIGN KEY (left_face) REFERENCES test_ar5_2022_01.face(face_id)
    "next_left_edge_exists" FOREIGN KEY (abs_next_left_edge) REFERENCES test_ar5_2022_01.edge_data(edge_id) DEFERRABLE INITIALLY DEFERRED
    "next_right_edge_exists" FOREIGN KEY (abs_next_right_edge) REFERENCES test_ar5_2022_01.edge_data(edge_id) DEFERRABLE INITIALLY DEFERRED
    "right_face_exists" FOREIGN KEY (right_face) REFERENCES test_ar5_2022_01.face(face_id)
    "start_node_exists" FOREIGN KEY (start_node) REFERENCES test_ar5_2022_01.node(node_id)
Referenced by:
    TABLE "test_ar5_2022_01.edge_data" CONSTRAINT "next_left_edge_exists" FOREIGN KEY (abs_next_left_edge) REFERENCES test_ar5_2022_01.edge_data(edge_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "test_ar5_2022_01.edge_data" CONSTRAINT "next_right_edge_exists" FOREIGN KEY (abs_next_right_edge) REFERENCES test_ar5_2022_01.edge_data(edge_id) DEFERRABLE INITIALLY DEFERRED
larsop commented 2 years ago

After

create index on test_ar5_2022_01.edge_data(abs_next_left_edge); create index on test_ar5_2022_01.edge_data(abs_next_right_edge);

We that the update is down from 21855.71064431967 ms to 0.15098291893603538


LIMIT 100;
        total         |         avg         | calls |                                                                             substring                                                                              
----------------------+---------------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 0.003973367149999998 | 0.15098291893603538 |  1579 | UPDATE "test_ar5_2022_01".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE next_left_edge= $3 AND abs_next_left_edge= $4
(2 rows)
strk commented 2 years ago

I believe you're hitting https://trac.osgeo.org/postgis/ticket/2083 Do you confirm your "clean up a simple feature" is basically using ST_RemEdgeModFace ?

strk commented 2 years ago

The problem with indices is that while the speed up reads they also slow down updates, so when to create the indices might be a user choice, depending on use case. This is just to say that we need to profile the effects of the index on both writes and reads.

larsop commented 2 years ago

I switched from ST_RemEdgeModFace to ST_RemEdgeNewFace that was more robust to related invalid faces in some cases.

We now created Postgis Topology with

SELECT count(*) from test_ar5_2022_01.face; count

6552124 (1 row)

SELECT count(*) from test_ar5_2022_01.edge; count

17258641 (1 row)

In less than 24 hours.

larsop commented 2 years ago

I will rerun the test about with this indexes in place to check how it works

strk commented 2 years ago

What do you mean by "invalid faces" ? All those function have an undefined behavior if the underlying topology is invalid

strk commented 2 years ago

HINT: you can get info about the size of topologies (count of faces, edges, nodes) with:

SELECT TopologySummary('test_ar5_2022_01')
larsop commented 2 years ago

HINT: you can get info about the size of topologies (count of faces, edges, nodes) with:

SELECT TopologySummary('test_ar5_2022_01')

Thanks forgot that one, SELECT TopologySummary('test_ar5_2022_01');

                            topologysummary                             
------------------------------------------------------------------------
 Topology test_ar5_2022_01 (id 108272, SRID 4258, precision 1e-05)     +
 11975736 nodes, 17258641 edges, 6552123 faces, 0 topogeoms in 0 layers+
strk commented 2 years ago

For the record: ST_CreateTopoGeo time is not visibly degraded by adding those indices. The test is for a much smaller topology:

strk=# select TopologySummary('wsa_ws_svw_polygon_topology_test');
                            topologysummary                             
------------------------------------------------------------------------
 Topology wsa_ws_svw_polygon_topology_test (id 49, SRID 0, precision 0)+
 19754 nodes, 29453 edges, 10978 faces, 0 topogeoms in 0 layers        +

ST_CreateTopoGeo takes:

Time: 237719.810 ms (03:57.720) -- with indices on abs_next_left_edge and abs_next_right_edge columns of edge_data Time: 235180.206 ms (03:55.180) -- without those indices

larsop commented 2 years ago

Did a new test running in 14 hours so the indexes seems not cause any new performance problems.

 topologysummary                             
------------------------------------------------------------------------
 Topology test_ar5_2022_01 (id 2814, SRID 4258, precision 1e-05)       +
 15176059 nodes, 21364105 edges, 9242564 faces, 0 topogeoms in 0 layers+ 
larsop commented 2 years ago

What do you mean by "invalid faces" ? All those function have an undefined behavior if the underlying topology is invalid

I think it failed when casting to simple feature from a TopoGeometry and it just seemed worker better when I used create new face as far as I can remember.

strk commented 2 years ago

Any bug like that would be useful to file upstream, if underlying topology was valid..