Speed up edge insertions in big topologies

strk commented 4 years ago

According to @larsop the process of adding new linestrings to a big topology using topology.TopoGeo_addLinestring PostGIS function is very slow.

By "big" we're talking about a topology with 8.6 milion "surfaces" (faces?) with an average of 132 points (vertices?node?) each.

I'm not sure this is the correct repository to track this: @larsop: does the code in this repository take care of performing such operation?

larsop commented 4 years ago

Yes I have been using this code for testing big data sets. I have merged the latest code into master.

strk commented 4 years ago

Please report commit hash (or appropriate link, if in a submodule) when referring to merges, it's easier to see what you are referring to

larsop commented 4 years ago

Sorry this one I did not understand.

strk commented 4 years ago

When you say I have merged the latest code into master I'm left with having to wonder where is the code that you merged, to look at it. Every merge has a unique identifier, looking like 85c11525c91bd697d55a71d498dc0894564458f6 (a "commit hash"). When written in a github comment without quotes it turns into a link ( 85c11525c91bd697d55a71d498dc0894564458f6 ) which can be followed by the reader, giving more context to the message

larsop commented 4 years ago

When I was testing, some cell was taking very long time so i started to debug it. What I did see was that the next operation that was helling edges was using a very long time.

Before this adding analyse

CALLresolve_overlap_gap_single_cell('sl_kbj.trl_2019_test_segmenter_mindredata','geo','gid','topo_sr16_mdata_05.trl_2019_test_segmenter_mindredata',
'topo_sr16_mdata_05',1,25833,'true','(49,5,10000,3,25,120,240,3,35)','topo_sr16_mdata_05.trl_2019_test_segmenter_mindredata_job_list','topo_sr16_mdata_05.trl_2019_test_segmenter_mindredata_grid',
'0103000020E9640000010000000500000000000000A0EA0A4162A964474BDD5A4100000000A0EA0A4142C6ED8430E25A4100000000B49E0B4142C6ED8430E25A4100000000B49E0B4162A964474BDD5A4100000000A0EA0A4162A964474BDD5A41',1,1);

this call took 28 minutes , but after adding analyse it took 22 seconds. What I do here is that I take a cell with simple feature data and insert into Postgis Topology layer where I heal edges and remove small polygons and then pick out the edges. The healing operation was reduced from many minutes to a couple of seconds.

strk commented 4 years ago

I've often wondered if we should have ANALYSE calls inside the topology functions, but then I though "autovacuum" would do a better job at it (it works by looking at stats, so knows better when it is time to kick). Do you have "autovacuum" turned on ? Can you check the PostgreSQL logs to see if anything is preventing autovacuum to run (for instance, exclusive locks on tables) ?

larsop commented 4 years ago

Yes autovacuum is running, but when running on high load I think I have seen canceling this operations before. I only found two of those now since I delete the logs quite often.

/var/log/postgresql/postgresql-Wed.log:< , 2020-05-13 18:41:38 CEST, , 2020-05-13 18:41:38.916 CEST >ERROR: canceling autovacuum task /var/log/postgresql/postgresql-Wed.log:< , 2020-05-13 18:43:43 CEST, , 2020-05-13 18:43:43.950 CEST >ERROR: canceling autovacuum task

When I checked for locking I did not any I used this cmd 'SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;'

If I have to run analyse or not seems to vary from dataset to dataset and how big cells that are used and the number threads in parallel.

I also tested by adding pg_sleep(some seconds) and that seemed to work some times. I also tested adding a commit but that did not seem help.

We have seen this need for running analyse when running batch operations with large number of inserts or updates in other cases also.

strk commented 4 years ago

Now I realize the problem with autovacuum is that changes in the statistics are not visible by running transactions, so if your function is ONE BIG transaction, it will not see changes in stats due to autovacuum or analyze called by other transactions (like autovacuum). This explains why running it yourself helps.

Having topology primitives running it would require some book-keeping about how many changes were really introduced by the running transaction, not such an easy thing to do.

larsop commented 4 years ago

Yes I run everything in many "BIG" transactions one for each cell.

I agree with you that it's ok for user code kick off analyse when the application means it needed.

I assume that the reason why commit did not work was related to that I started healing edges before analyse got the chance or was busy with analysing other tables.

strk commented 4 years ago

I've added a section in PostGIS manual about this problem: https://trac.osgeo.org/postgis/changeset/130fdf694621d0406f09f880ec85715f1aa1571f/git

It's visible here: https://postgis.net/docs/manual-dev/postgis_usage.html#Topology_StatsManagement

Can we consider this ticket closed for now, given the big boost in performance ?

larsop commented 4 years ago

It seems like we have solved the problems for step 1 and 2 in line in 167 and 308 .

After step 2 we have set of edges that are dived by grid cells. The next step 3 is to add the lines that cross cell borders. At this step I have always been running analyse for each 200 cell job done.

I am running a test now on ar5 to check it out.

larsop commented 4 years ago

Here is some more info on the ar5 test. Before step 3 we had around 19 million edges. The job was splitt into about 4500 cells jobs. The number lines crossing cell borders is around 250000. Step 3 (add lines crossing cell borders) is running in a single thread and it has been running for more than 4 days. Here are the number of cell processed by each day.

select tt, count(*) from ( select to_char( extract(doy from done_time),'fm000') as tt from  test_topo_ar5.ar5_2019_komm_flate_job_list_donejobs order by done_time) as r group by tt order by tt;
 tt  | count 
-----+-------
 137 |  1681
 138 |  1757
 139 |   201
 140 |    58
 141 |     7

The first 2 days it works good, but after that the performance really drops. There is no iowait or high CPU load in the server.

Here is the number of cell pr hour.

select tt, count(*) from ( select to_char( extract(doy from done_time),'fm000')||'_'||to_char( extract(hour from done_time),'fm00') as tt from  test_topo_ar5.ar5_2019_komm_flate_job_list_donejobs order by done_time) as r group by tt order by tt;
   tt   | count 
--------+-------
 137_01 |   198
 137_02 |   257
 137_03 |   183
 137_04 |   118
 137_05 |    68
 137_06 |    87
 137_07 |    49
 137_08 |    36
 137_09 |    30
 137_10 |    41
 137_11 |    49
 137_12 |    86
 137_13 |    78
 137_14 |    89
 137_15 |    73
 137_16 |    20
 137_17 |    15
 137_18 |    20
 137_19 |    42
 137_20 |    28
 137_21 |    41
 137_22 |    39
 137_23 |    34
 138_00 |    43
 138_01 |    59
 138_02 |    75
 138_03 |    43
 138_04 |    77
 138_05 |    55
 138_06 |    59
 138_07 |    47
 138_08 |    85
 138_09 |    85
 138_10 |    94
 138_11 |    73
 138_12 |    44
 138_13 |   101
 138_14 |     7
 138_15 |    49
 138_16 |    53
 138_17 |   116
 138_18 |    97
 138_19 |    77
 138_20 |    48
 138_21 |    97
 138_22 |   116
 138_23 |   157
 139_00 |    83
 139_01 |    15
 139_02 |    11
 139_03 |     3
 139_04 |     1
 139_05 |     3
 139_06 |    11
 139_07 |     6
 139_08 |     9
 139_09 |     7
 139_10 |     5
 139_11 |     9
 139_12 |     7
 139_13 |     1
 139_14 |     3
 139_15 |     3
 139_16 |     1
 139_17 |     1
 139_18 |     2
 139_19 |     4
 139_20 |     5
 139_21 |     5
 139_22 |     3
 139_23 |     3
 140_00 |     2
 140_01 |     3
 140_02 |     3
 140_03 |     5
 140_04 |     2
 140_05 |     1
 140_07 |     2
 140_08 |     5
 140_09 |     3
 140_10 |     5
 140_11 |     2
 140_12 |    11
 140_13 |     1
 140_14 |     5
 140_15 |     5
 140_16 |     3
 141_00 |     1
 141_01 |     3
 141_11 |     2
 141_21 |     1
 142_00 |     1
 142_02 |     3

(I had a bug in the ANNLYZE code, so I do it manually but that does seem to hava any effect and that is natural, because it very low load and echo cell job is done in one transaction )

strk commented 4 years ago

Is the code still running ? If so: how many of those 250K lines were added ? I don't understand the number of cells per hour: is step 3 also performed in "cells" ? Because I understood step 3 being the step in which cell-crossing edges are being added. Are the edges all being added in a single transaction ? Does using multiple transactions (say: a transaction every 100 edges) changes those numbers ? How many cells do these lines overlap (min/max/avg) ?

larsop commented 4 years ago

No the code is not running but I can restart the test.
The number of crossing lines vary a lot from cell to cell.
I try to handle the cells with the most crossing border line in the start.
I would belive that maybe 70% ?? of the border lines was added when giving up.
I have not tested to limit the number for each transaction , but it would surprize me if it helps, because when added lines inside cell the number of lines is much bigger. And another reason is that I try select the cell with the most border rows first and that goes fast in the start. Since we add cells with less rows at the end the time for each cell should not increase
We also see this same behaviour with other datasets the with different density and size.
I have find the number for min/max/avg when running the test again.

strk commented 4 years ago

Using multiple transactions would give you the chance to "save" the state of the topology and focus on profiling a single new addition.

larsop commented 4 years ago

Ok, I can change the code to and use multiple commits.

I have used pg_stat_statements and info from here for profiling .

Do you want me to pick out any special profiling info ?

larsop commented 4 years ago

Her are the number of rows pr cell line for the borderlines (When I create cells for border lines I split each cell i 4 based on the 4 edges. A borderline can intersect many border line so I select the first "one")

min | max | avg -----+-----+----- 1 | 220 | 61

Here also som stats about borderlines for the number of points pr borderline.

min | max | avg -----+------+----- 2 | 9341 | 74

larsop commented 4 years ago

Postgres core dumped in the weekend when running the test. I have now picked up latest postgis from GitHub and compiled it and restarted the test.

Going from

POSTGIS="3.0.1 ec2a9aa" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" PROJ="7.0.0" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" WAGYU="0.4.3 (Internal)" TOPOLOGY
(1 row)

To

POSTGIS="3.1.0dev 3.1.0alpha1-137-ga2557be" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" PROJ="7.0.1" LIBXML="2.9.1" LIBJSON="0.11" (core procs from "3.0.1 ec2a9aa" need upgrade) 
TOPOLOGY (topology procs from "3.0.1 ec2a9aa" need upgrade)

postgis            | 3.1.0dev | public     | PostGIS geometry, geography, and raster spatial types and functions
 postgis_topology   | 3.1.0dev | topology   | PostGIS topology spatial types and functions

larsop commented 4 years ago

When testing AR5 after this upgrade things are going very slow on stage where we work totally independent in each thread. After more the one hour I have not been able to finish

The server has 16 core cpu with hyper threading . I have tested 28,22 .. 20 threads and now has reduced to 10 threads to see what happens

The cpu load is around 10% and no iowait and in the activity list I see this. This seems to be new, I have not seen this much of MultiXactOffsetControlLock before, we usually see SubtransControlLock

SELECT LEFT(query,10), pid, state,wait_event from pg_stat_activity where state != 'idle';
    left    |  pid  | state  |         wait_event         
------------+-------+--------+----------------------------
 CALL resol | 10714 | active | PgSleep
 CALL resol | 10798 | active | MultiXactOffsetControlLock
 CALL resol | 10799 | active | [NULL]
 CALL resol | 10800 | active | multixact_offset
 CALL resol | 10801 | active | MultiXactOffsetControlLock
 CALL resol | 10802 | active | MultiXactOffsetControlLock
 CALL resol | 10803 | active | SLRURead
 CALL resol | 10804 | active | MultiXactOffsetControlLock
 CALL resol | 10805 | active | MultiXactOffsetControlLock
 CALL resol | 10806 | active | [NULL]
 CALL resol | 10807 | active | MultiXactOffsetControlLock
 CALL resol | 10808 | active | [NULL]
 CALL resol | 10809 | active | [NULL]
 CALL resol | 10810 | active | MultiXactOffsetControlLock
 CALL resol | 10811 | active | MultiXactOffsetControlLock
 CALL resol | 10812 | active | MultiXactOffsetControlLock
 CALL resol | 10813 | active | MultiXactOffsetControlLock
 CALL resol | 10814 | active | multixact_offset
 CALL resol | 10815 | active | MultiXactOffsetControlLock
 CALL resol | 10816 | active | MultiXactOffsetControlLock
 CALL resol | 10817 | active | MultiXactOffsetControlLock
 SELECT LEF | 10844 | active | [NULL]
(22 rows)

larsop commented 4 years ago

Just a update here, by reducing max number of rows from 10.000 to 5.000 pr. grid cell, it seem to work better . The problem by using smaller grid cells is that it will be more work in step 3. I have used 10.000 as size before.

larsop commented 4 years ago

What I did now was that quit the job in step 3 after about 75% of the cell border lines was added.

I the attached image the green lines are added from before and red lines are lines to add. Ar5_topo_01

I made a simple sql to do manually adding of lines

create table test_topo_ar5.temp_bl_01 as (
select topo_update.get_left_over_borders(
'test_topo_ar5.ar5_2019_komm_flate_grid',
'geo',                                                                                
l.cell_geo ,                                                                                                                                                            
'test_topo_ar5.ar5_2019_komm_flate') ::geometry(LineString,4258) as geom 
from
(
select distinct l.cell_geo, l.num_polygons from 
test_topo_ar5.ar5_2019_komm_flate_job_list l,
test_topo_ar5.ar5_2019_komm_flate_border_line_segments ls
where l.id not in (select id from test_topo_ar5.ar5_2019_komm_flate_job_list_donejobs)
and ST_intersects(l.cell_geo,ls.geo)
order by l.num_polygons desc
limit 1
) as l
)
;

alter table test_topo_ar5.temp_bl_01 add column id serial;

select topology.TopoGeo_addLinestring('test_topo_ar5',l.geom,1e-06)
from test_topo_ar5.temp_bl_01 l where id = 1;

When added line with id 1 it was just hanging , so I checked pg_stat_activity and there was still jobs running.

I tried to kill these jobs with pg_terminate that taking for ever so I tried restart and that also takes for ever.

So I checked the logs and found >CONTEXT: SQL statement "SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN (19216321,12910029,12910031,12859309,12910027,12830799,12890445,12872199.,

If I parsed the log correctly this select contains more that 100.000 numbers.

For now the servers is just hanging while trying restart, hopefully i restart soon and we get out more info and test more on adding single lines.

(It it fails I have to restart the test because I use a lot of unlogged tables)

larsop commented 4 years ago

Overview of number of rows

vroom2.ad.skogoglandskap.no postgres@resolve_cha=# SELECT count(*) from test_topo_ar5.edge_data;
  count   
----------
 19083715
(1 row)

vroom2.ad.skogoglandskap.no postgres@resolve_cha=# SELECT count(*) from test_topo_ar5.face;
  count  
---------
 7822973
(1 row)

vroom2.ad.skogoglandskap.no postgres@resolve_cha=# SELECT count(*) from test_topo_ar5.node;
  count   
----------
 13436900
(1 row)

add_single_01

I now try to add only the red think line, the thin red lines are other lines to added later. The green lines existing edges.

After around 13 minutes the line is added but no face i added that is probably correct also.

select topology.TopoGeo_addLinestring('test_topo_ar5',l.geom,1e-06) from test_topo_ar5.temp_bl_01 l where id = 1;

 topogeo_addlinestring 
-----------------------
              19216322
(1 row)

Time: 783075.720 ms (13:03.076)

Here is what it looks like after adding the lines , and I now also the face table here.

Ar5_added__line01

larsop commented 4 years ago

Here is another test where I merge the 2 red thick lines before I add and that goes much faster

ar5_b2_before

select topology.TopoGeo_addLinestring('test_topo_ar5',l.geom,1e-06)
from (
select ST_LineMerge(ST_union(geom)) as geom from test_topo_ar5.temp_bl_01 l where id in (4,5)
) as l

topogeo_addlinestring 
-----------------------
              19216323
              19216324
(2 rows)

Time: 51714.285 ms (00:51.714)

a2_after

Then we also got a new face

strk commented 4 years ago

The query you found in PostgreSQL logs (the one selecting 100k+ edges) is issued by the PostGIS Topology code upon finding that the edge you added DID form a new ring (thus a new face). The fact you report a new face was NOT created is concerning, so first of all I'd recommend that you (1) validate the starting state of the topology and (2) make the test of adding the new edge in a transaction that you'd rollback afterwards, to have a way to reproduce and help me reproducing the issue.

For gory detail: the SQL query is constructed by the getEdgeById backend function (topology/postgis_topology.c), invoked by _lwt_MakeRingShell internal function of liblwgeom (liblwgeom/lwgeom_topo.c) which is used by _lwt_AddFaceSplit internal function of liblwgeom which is the function figuring if a new ring is formed (_lwt_MakeRingShell is only called if such ring is formed); this means the single tick red line was found to indeed form a new ring.

It would help to see, in your pictures:

Face identifiers on the edge sides
Your blocking grid edges

The determined rings's geometry is constructed, using _lwt_MakeRingShell, to determine:

Winding of the new face, which determines which face is kept and which face is made new
Bounding box of the new face

The reason why adding the merged edges of a new face is being faster is unclear. It could be because the "other" red line gets added first (you could try this manually, adding the other side of the merged line first) and the first line would greatly reduce the ring size. This has to be verified by first ensuring the starting topology is correct. I'm more than happy to perform this analysis myself but it's very very important that the starting topology is valid (see #18)

Now, what I suggested you in the past was the artificially split these big faces by inserting into the topology, as the first thing, the edges of the cell grid. Doing so would ensure that no face would EVER cross a single grid cell, which should keep the complexity of faces low.

larsop commented 4 years ago

The query you found in PostgreSQL logs (the one selecting 100k+ edges) is issued by the PostGIS Topology code upon finding that the edge you added DID form a new ring (thus a new face). The fact you report a new face was NOT created is concerning, so first of all I'd recommend that you (1) validate the starting state of the topology and (2) make the test of adding the new edge in a transaction that you'd rollback afterwards, to have a way to reproduce and help me reproducing the issue.

Sorry I was unclear here, in the first case it's correct that no new face should be created, because I did not add both edges needed to create a new face.

I may do a test when adding lines crossing cell borders by doing line merge so will always create a valid face . This should stop the server from checking all edges in the whole world.

Maybe this also has effect on corrupted topologies because the server are locking around in whole world to try to find a valid face.

larsop commented 4 years ago

It would help to see, in your pictures:

Face identifiers on the edge sides Sorry how show this egis ?

Now, what I suggested you in the past was the artificially split these big faces by inserting into the topology, as the first thing, the edges of the cell grid. Doing so would ensure that no face would EVER cross a single grid cell, which should keep the complexity of faces low.

I did a test than this before then I got performance problem here, but I think that may be related something I did wrong.

The other problem was how remove this edges that should be there ?

When adding a line that no should create a new valid face, could I then send a parameter for max face to be created ?

This should following edges all around the world .

If no valid face is create with this max MBR the add line call is returned with returned with a empty result .

larsop commented 4 years ago

Added code to be able to debug more easily.

Here we start and stop right before we start to add border lines, but after we added very long lines.

-- Call function to resolve overlap but stop a job_type 3 and loop number 1
CALL resolve_overlap_gap_run(
('test_data.overlap_gap_input_t3','c1t3','geo',25833,true), -- TYPE resolve_overlap_data_input
('test_topo_t3',1.0), -- TYPE resolve_overlap_data_topology
  resolve_overlap_data_clean_type_func(  -- TYPE resolve_overlap_data_clean
  49,  -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter.
  0, -- is this is more than zero simply will called with
  null, -- _max_average_vertex_length, in meter both for utm and deegrees, this used to avoid running ST_simplifyPreserveTopology for long lines lines with few points
  0, -- IF 0 NO CHAKINS WILL BE DONE A big value here make no sense because the number of points will increaes exponential )
  10000, --edge that are longer than this value will not be touched by _chaikins_min_degrees and _chaikins_max_degrees  
  120, -- The angle has to be less this given value, This is used to avoid to touch all angles. 
  240, -- OR the angle has to be greather than this given value, This is used to avoid to touch all angles 
  40, -- The angle has to be less this given value, This is used to avoid to touch all angles. 
  320 -- OR The angle has to be greather than this given value, This is used to avoid to touch all angles 
)
,5,4,
  resolve_overlap_data_debug_options_func(
  false, --if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows
  false, -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
  false, --  if set to false, it will in many cases generate topo errors beacuse of running in many parralell threads
  1, -- if set to more than 1 it will skip init procces and start at given job_type
  1, -- many of jobs are ran in loops beacuse because if get an exception or cell is not allowed handle because cell close to is also started to work , this cell will gandled in the next loop.
  3, -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  1 -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  ) 
);

The output from call will be a set of statements to be execute next. You can run this statements by hand of restart test where stopped starting with correct parameters.

Her we start the function start function a job_type 3 and loop number 2 run job until we are done.

CALL resolve_overlap_gap_run(
('test_data.overlap_gap_input_t3','c1t3','geo',25833,true), -- TYPE resolve_overlap_data_input
('test_topo_t3',1.0), -- TYPE resolve_overlap_data_topology
  resolve_overlap_data_clean_type_func(  -- TYPE resolve_overlap_data_clean
  49,  -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter.
  0, -- is this is more than zero simply will called with
  null, -- _max_average_vertex_length, in meter both for utm and deegrees, this used to avoid running ST_simplifyPreserveTopology for long lines lines with few points
  0, -- IF 0 NO CHAKINS WILL BE DONE A big value here make no sense because the number of points will increaes exponential )
  10000, --edge that are longer than this value will not be touched by _chaikins_min_degrees and _chaikins_max_degrees  
  120, -- The angle has to be less this given value, This is used to avoid to touch all angles. 
  240, -- OR the angle has to be greather than this given value, This is used to avoid to touch all angles 
  40, -- The angle has to be less this given value, This is used to avoid to touch all angles. 
  320 -- OR The angle has to be greather than this given value, This is used to avoid to touch all angles 
)
,5,4,
  resolve_overlap_data_debug_options_func(
  false, --if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows
  false, -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
  false, --  if set to false, it will in many cases generate topo errors beacuse of running in many parralell threads
  3, -- if set to more than 1 it will skip init procces and start at given job_type
  2, -- many of jobs are ran in loops beacuse because if get an exception or cell is not allowed handle because cell close to is also started to work , this cell will gandled in the next loop.
  0, -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  0 -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  ) 
);

To use this new code you have to -- Add new a type from main/sql/types_resolve_overlap.sql -- Drop the old code for resolve_overlap_gap_run

DROP PROCEDURE IF EXISTS resolve_overlap_gap_run(
_input resolve_overlap_data_input_type, 
--(_input).table_to_resolve varchar, -- The table to resolv, imcluding schema name
--(_input).table_pk_column_name varchar, -- The primary of the input table
--(_input).table_geo_collumn_name varchar, -- the name of geometry column on the table to analyze
--(_input).table_srid int, -- the srid for the given geo column on the table analyze
--(_input).utm boolean, 

_topology_info resolve_overlap_data_topology_type,
---(_topology_info).topology_name varchar, -- The topology schema name where we store store sufaces and lines from the simple feature dataset and th efinal result
-- NB. Any exting data will related to topology_name will be deleted
--(_topology_info).topology_snap_tolerance float, -- this is tolerance used as base when creating the the postgis topolayer

_clean_info resolve_overlap_data_clean_type, -- different parameters used if need to clean up your data
--(_clean_info).simplify_tolerance float, -- is this is more than zero simply will called with
--(_clean_info).do_chaikins boolean, -- here we will use chaikins togehter with simply to smooth lines
--(_clean_info).min_area_to_keep float, -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter. 

_max_parallel_jobs int, -- this is the max number of paralell jobs to run. There must be at least the same number of free connections
_max_rows_in_each_cell int
);

DROP PROCEDURE IF EXISTS resolve_overlap_gap_run(
_input resolve_overlap_data_input_type, 
--(_input).table_to_resolve varchar, -- The table to resolv, imcluding schema name
--(_input).table_pk_column_name varchar, -- The primary of the input table
--(_input).table_geo_collumn_name varchar, -- the name of geometry column on the table to analyze
--(_input).table_srid int, -- the srid for the given geo column on the table analyze
--(_input).utm boolean, 

_topology_info resolve_overlap_data_topology_type,
---(_topology_info).topology_name varchar, -- The topology schema name where we store store sufaces and lines from the simple feature dataset and th efinal result
-- NB. Any exting data will related to topology_name will be deleted
--(_topology_info).topology_snap_tolerance float, -- this is tolerance used as base when creating the the postgis topolayer

_clean_info resolve_overlap_data_clean_type, -- different parameters used if need to clean up your data
--(_clean_info).simplify_tolerance float, -- is this is more than zero simply will called with
--(_clean_info).do_chaikins boolean, -- here we will use chaikins togehter with simply to smooth lines
--(_clean_info).min_area_to_keep float, -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter. 

_max_parallel_jobs int, -- this is the max number of paralell jobs to run. There must be at least the same number of free connections
_max_rows_in_each_cell int, -- this is the max number rows that intersects with box before it's split into 4 new boxes, default is 5000
_contiune_after_stat_exception boolean -- if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows 
);

DROP PROCEDURE IF EXISTS resolve_overlap_gap_run(
_input resolve_overlap_data_input_type, 
--(_input).table_to_resolve varchar, -- The table to resolv, imcluding schema name
--(_input).table_pk_column_name varchar, -- The primary of the input table
--(_input).table_geo_collumn_name varchar, -- the name of geometry column on the table to analyze
--(_input).table_srid int, -- the srid for the given geo column on the table analyze
--(_input).utm boolean, 

_topology_info resolve_overlap_data_topology_type,
---(_topology_info).topology_name varchar, -- The topology schema name where we store store sufaces and lines from the simple feature dataset and th efinal result
-- NB. Any exting data will related to topology_name will be deleted
--(_topology_info).topology_snap_tolerance float, -- this is tolerance used as base when creating the the postgis topolayer

_clean_info resolve_overlap_data_clean_type, -- different parameters used if need to clean up your data
--(_clean_info).simplify_tolerance float, -- is this is more than zero simply will called with
--(_clean_info).do_chaikins boolean, -- here we will use chaikins togehter with simply to smooth lines
--(_clean_info).min_area_to_keep float, -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter. 

_max_parallel_jobs int, -- this is the max number of paralell jobs to run. There must be at least the same number of free connections
_max_rows_in_each_cell int, -- this is the max number rows that intersects with box before it's split into 4 new boxes, default is 5000
_contiune_after_stat_exception boolean, -- if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows 
_validate_topoplogy_for_each_run boolean -- DEFAULT false -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
);

DROP PROCEDURE IF EXISTS resolve_overlap_gap_run(
_input resolve_overlap_data_input_type, 
--(_input).table_to_resolve varchar, -- The table to resolv, imcluding schema name
--(_input).table_pk_column_name varchar, -- The primary of the input table
--(_input).table_geo_collumn_name varchar, -- the name of geometry column on the table to analyze
--(_input).table_srid int, -- the srid for the given geo column on the table analyze
--(_input).utm boolean, 

_topology_info resolve_overlap_data_topology_type,
---(_topology_info).topology_name varchar, -- The topology schema name where we store store sufaces and lines from the simple feature dataset and th efinal result
-- NB. Any exting data will related to topology_name will be deleted
--(_topology_info).topology_snap_tolerance float, -- this is tolerance used as base when creating the the postgis topolayer

_clean_info resolve_overlap_data_clean_type, -- different parameters used if need to clean up your data
--(_clean_info).simplify_tolerance float, -- is this is more than zero simply will called with
--(_clean_info).do_chaikins boolean, -- here we will use chaikins togehter with simply to smooth lines
--(_clean_info).min_area_to_keep float, -- if this a polygon  is below this limit it will merge into a neighbour polygon. The area is sqare meter. 

_max_parallel_jobs int, -- this is the max number of paralell jobs to run. There must be at least the same number of free connections
_max_rows_in_each_cell int,
_debug_options resolve_overlap_data_debug_options_type
);

Add the new code from src/main/sql/function_resolve_overlap_gap_run.sql

larsop commented 4 years ago

Did a fix here so we do not only break the inner loop but also exits the program.

larsop commented 4 years ago

Let's run a complete example with sl_esh.ar50_utvikling_flate

1) Start the job and stop it before we start add the normal border line (the very long borderlines are added before this stage always in single thread)

CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
30,1000,
resolve_overlap_data_debug_options_func(
  true, --if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows
  false, -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
  false, --  if set to false, it will in many cases generate topo errors beacuse of running in many parralell threads
  1, -- if set to more than 1 it will skip init procces and start at given job_type
  1, -- many of jobs are ran in loops beacuse because if get an exception or cell is not allowed handle because cell close to is also started to work , this cell will gandled in the next loop.
  3, -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  1 -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  ) 
)"

The output from the call is long list of statements to run next

"CALL resolve_overlap_gap_single_cell(
  'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
  'test_topo_ar50_t7',1,25833,'true',
  '(300,9,500,3,140,120,240,3,35)',
  'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E9640000010000000500000000000000793707410000002089A75A410000000079370741000000208FA75A4100000000C8B20741000000208FA75A4100000000C8B207410000002089A75A4100000000793707410000002089A75A41',3,1);","CALL resolve_overlap_gap_single_cell(
  'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
  'test_topo_ar50_t7',1,25833,'true',
  '(300,9,500,3,140,120,240,3,35)',
  'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E96400000100000005000000000000004069F8400000000020F05941000000004069F84000000080E0825A4100000000C06AF84000000080E0825A4100000000C06AF8400000000020F05941000000004069F8400000000020F05941',3,1);"}

2) In the postgres log also find this out, do grep for this line >WARNING: EXIT with 601 jobs for cell_job_type 3 at loop_number 1 for topology test_topo_ar50_t7 In the next lines you find the 601 jobs to run in correct order.

WARNING:  stmts to run --> {"CALL resolve_overlap_gap_single_cell(
          'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
          'test_topo_ar50_t7',1,25833,'true',
          '(300,9,500,3,140,120,240,3,35)',
          'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941',3,1);","CALL resolve_overlap_gap_single_cell(
          'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
          'test_topo_ar50_t7',1,25833,'true',
          '(300,9,500,3,140,120,240,3,35)',
          'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E964000001000000050000000000000047121241000000E0B63859410000000047121241000000E0BC38594100000000368D1241000000E0BC38594100000000368D1241000000E0B63859410000000047121241000000E0B6385941',3,1);","CALL resolve_overlap_gap_single_cell(

We can then pick out first job a test it by hand

CALL resolve_overlap_gap_single_cell(
[more] ( >           'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
[more] ( >           'test_topo_ar50_t7',1,25833,'true',
[more] ( >           '(300,9,500,3,140,120,240,3,35)',
[more] ( >           'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941',3,1);
NOTICE:  00000: select id from test_topo_ar50_t7.ar50_utvikling_flate_job_list where cell_geo = '0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941' 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: enter at timeofday:Wed Jul 08 06:42:32.155072 2020 CEST for layer test_topo_ar50_t7_, with _cell_job_type 3 and box id 194 .
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: start work at timeofday:Wed Jul 08 06:42:32.159581 2020 CEST for layer test_topo_ar50_t7_194, _topology_snap_tolerance 1, with _cell_job_type 3 and min_length_line 0.3s, (_clean_info).chaikins_max_degrees) 240
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: command_string1 CREATE TEMP table temp_left_over_borders as select geo FROM
    (select geo from topo_update.get_left_over_borders('test_topo_ar50_t7.ar50_utvikling_flate_grid','geo','0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941','test_topo_ar50_t7.ar50_utvikling_flate') as r) as r 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: command_string2 WITH 
                              edge_line AS 
                              (
                                SELECT distinct e.geom as geom 
                                FROM 
                                temp_left_over_borders i, 
                                test_topo_ar50_t7.edge_data e
                                WHERE ST_DWithin(i.geo,e.geom,1) 
                                UNION 
                                SELECT i.geo AS geom
                                FROM temp_left_over_borders i
                              ),
                              edge_bb AS
                              (
                                SELECT ST_Union(ST_Envelope(e.geom)) as geom FROM edge_line e
                              ),
                              face_1 AS 
                              (
                                SELECT distinct f.mbr as geom 
                                FROM 
                                edge_bb i, 
                                test_topo_ar50_t7.face f
                                WHERE ST_DWithin(i.geom,f.mbr,1) 
                                UNION 
                                SELECT i.geom
                                FROM edge_bb i
                              ),
                              face_2 AS 
                              (
                                SELECT distinct f.mbr as geom 
                                FROM 
                                face_1 i, 
                                test_topo_ar50_t7.face f
                                WHERE ST_DWithin(i.geom,f.mbr,1) 
                                UNION 
                                SELECT i.geom
                                FROM edge_bb i
                              ),
                              edge_2 AS 
                              (
                                SELECT distinct e.geom as geom 
                                FROM 
                                face_2 i, 
                                test_topo_ar50_t7.edge_data e
                                WHERE ST_DWithin(i.geom,e.geom,1) 
                                UNION 
                                SELECT i.geom
                                FROM face_2 i
                              ),
                              final_block AS
                              (
                                SELECT ST_Union(ST_Expand(e.geom,1)) as geom FROM edge_2 e
                              )
                              SELECT ST_Multi(geom) FROM final_block i 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Failed to make block for _cell_job_type 3, num_boxes_intersect <NULL>, num_boxes_free <NULL>
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Locked 0  faces for update top toplogy test_topo_ar50_t7 and _cell_job_type 3, for area_to_block 0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Locked 0  edge_data for update top toplogy test_topo_ar50_t7 and _cell_job_type 3, for area_to_block 0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Locked 0  nodes for update toplogy test_topo_ar50_t7 and _cell_job_type 3, for area_to_block 0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Locked 0  edges based on area_to_block for update toplogy test_topo_ar50_t7 and _cell_job_type 3, for area_to_block 0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Locked 0  edges based on area_to_block for update toplogy test_topo_ar50_t7 and _cell_job_type 3, for area_to_block 0103000020E9640000010000000500000000000000D68C1241000000E0B638594100000000D68C1241000000E0BC38594100000000C5071341000000E0BC38594100000000C5071341000000E0B638594100000000D68C1241000000E0B6385941 
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: Added edges for border lines for box 194 into line_edges_added <NULL>
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: done work at timeofday:Wed Jul 08 06:42:32.509694 2020 CEST for layer test_topo_ar50_t7, with _cell_job_type 3
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: work done for cell 194 at 2020-07-08 06:42:32.51068+02 border_layer_id <NULL>, using 0.352657 sec
LOCATION:  exec_stmt_raise, pl_exec.c:3827
NOTICE:  00000: leave work at timeofday:Wed Jul 08 06:42:32.511663 2020 CEST for layer test_topo_ar50_t7, with _cell_job_type 3 for cell 194
LOCATION:  exec_stmt_raise, pl_exec.c:3827
CALL
Time: 372.914 ms

In this first cell there is no border in this case because when I make the cells for the border lines I use each line in the original box as cell with a small buffer and i addition I try group jobs spatially.

So if we check the joblist we see this, we are done with cell but the number border added is zero.

select minute, count(*) as num_cells, sum(num_border_rows) as num_border_rows from 
( 
  select 
  to_char( extract(HOUR from d.done_time),'fm00')||':'||to_char( extract(MINUTE from d.done_time),'fm00') as minute ,
  coalesce(cell_border_rows.num_border_rows, 0) as num_border_rows
  from  
  test_topo_ar50_t7.ar50_utvikling_flate_job_list_donejobs d 
  LEFT JOIN (
    SELECT id, count(*) as num_border_rows 
    from (
      SELECT l.geo, min(g.id) as id 
      from 
      test_topo_ar50_t7.ar50_utvikling_flate_job_list g,
      test_topo_ar50_t7.ar50_utvikling_flate_border_line_segments l
      where l.geo && g.cell_geo and ST_Intersects(l.geo,g.cell_geo)
      group by l.geo
    ) as r group by id
  ) as cell_border_rows ON cell_border_rows.id = d.id
  order by done_time
) as r 
group by minute order by minute;
 minute | num_cells | num_border_rows 
--------+-----------+-----------------
 06:42  |         1 |               0
(1 row)

We start the job again from where we stopped by using this command.

"CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
30,1000,
resolve_overlap_data_debug_options_func(
  true, --if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows
  false, -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
  false, --  if set to false, it will in many cases generate topo errors beacuse of running in many parralell threads
  3, -- if set to more than 1 it will skip init procces and start at given job_type
  2, -- many of jobs are ran in loops beacuse because if get an exception or cell is not allowed handle because cell close to is also started to work , this cell will gandled in the next loop.
  3, -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  3 -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  ) 
)"

In the below we se that at least can 6000 borders lines pr. minute. (the row from first run is lost here, this is fixed now)

select minute, count(*) as num_cells, sum(num_border_rows) as num_border_rows from 
( 
  select 
  to_char( extract(HOUR from d.done_time),'fm00')||':'||to_char( extract(MINUTE from d.done_time),'fm00') as minute ,
  coalesce(cell_border_rows.num_border_rows, 0) as num_border_rows
  from  
  test_topo_ar50_t7.ar50_utvikling_flate_job_list_donejobs d 
  LEFT JOIN (
    SELECT id, count(*) as num_border_rows 
    from (
      SELECT l.geo, min(g.id) as id 
      from 
      test_topo_ar50_t7.ar50_utvikling_flate_job_list g,
      test_topo_ar50_t7.ar50_utvikling_flate_border_line_segments l
      where l.geo && g.cell_geo and ST_Intersects(l.geo,g.cell_geo)
      group by l.geo
    ) as r group by id
  ) as cell_border_rows ON cell_border_rows.id = d.id
  order by done_time
) as r 
group by minute order by minute;
 minute | num_cells | num_border_rows 
--------+-----------+-----------------
 09:32  |       333 |            6028
 09:33  |        68 |              61
(2 rows)

To add more borderlines you same sql as above and you can run one more iteration.

minute | num_cells | num_border_rows 
--------+-----------+-----------------
 09:32  |       333 |            6028
 09:33  |        68 |              61
 09:47  |        86 |            1079

We have now added more than 7000 of 9000 borders lines in a very short time. Lets run 5 loops now see how that works.

"CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
30,1000,
resolve_overlap_data_debug_options_func(
  true, --if set to false, it will do topology.ValidateTopology and stop to if the this call returns any rows
  false, -- if set to true, it will do topology.ValidateTopology at each loop return if it's error 
  false, --  if set to false, it will in many cases generate topo errors beacuse of running in many parralell threads
  3, -- if set to more than 1 it will skip init procces and start at given job_type
  2, -- many of jobs are ran in loops beacuse because if get an exception or cell is not allowed handle because cell close to is also started to work , this cell will gandled in the next loop.
  3, -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  7 -- if set to more than 0 the job will stop  when this job type is reday to run and display a set sql to run
  ) 
)"

We see how the performance decrease when we check the jobs done

group by minute order by minute;
 minute | num_cells | num_border_rows 
--------+-----------+-----------------
 09:32  |       333 |            6028
 09:33  |        68 |              61
 09:47  |        86 |            1079
 09:49  |        35 |             530
 09:51  |         1 |              11
 09:52  |         4 |             171
 09:54  |         2 |               2
 09:55  |         4 |              69
 09:58  |         2 |              71
 09:59  |         1 |              42
 10:00  |         5 |              73

From the postgres log I pick out the next statement run and that add 40 lines and it used 2 minutes.

CALL resolve_overlap_gap_single_cell(
          'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
          'test_topo_ar50_t7',1,25833,'true',
          '(300,9,500,3,140,120,240,3,35)',
          'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E96400000100000005000000000000801DCA1241000000A059EF5841000000801DCA1241000000488BF85841000000807DCA1241000000488BF85841000000807DCA1241000000A059EF5841000000801DCA1241000000A059EF5841',3,3);
Time: 116195.607 ms (01:56.196)

Now lets start the test from the scratch again and run the same sql as above adding 40 border lines as the first job, it now used around 2 seconds instead of almost 2 minutes, because we ran this as the first statement and not late in the process as in first cases.

CALL resolve_overlap_gap_single_cell(
          'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
          'test_topo_ar50_t7',1,25833,'true',
          '(300,9,500,3,140,120,240,3,35)',
          'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E96400000100000005000000000000801DCA1241000000A059EF5841000000801DCA1241000000488BF85841000000807DCA1241000000488BF85841000000807DCA1241000000A059EF5841000000801DCA1241000000A059EF5841',3,3);
Time: 1995.079 ms (00:01.995)

larsop commented 4 years ago

Just a comment here it may also seems like invalid topologies may have some having a effect on this issue.

When testing new blocking area for errors here , the performance decrease seems to not that much .

minute | num_cells | num_border_rows 
--------+-----------+-----------------
 10:08  |        45 |             566
 10:09  |       104 |            2205
 10:10  |       158 |            1393
 10:11  |        92 |            1424
 10:12  |        46 |             860
 10:13  |        22 |             420
 10:14  |         4 |              86
 10:15  |        12 |             136
 10:16  |        10 |              88
 10:17  |        10 |             214
 10:18  |        12 |             251
 10:19  |        11 |             295
 10:20  |        18 |             303
 10:21  |        26 |             519
 10:22  |        14 |             309
 10:23  |         7 |             186

I will update you when done testing new blocking area, that will take day or two.

strk commented 4 years ago

Just a comment here it may also seems like invalid topologies may have some having a effect on this issue.

For sure broken topology can reduce performance, as the code following edge rings may end up traversing ALL edges in the topology instead of closing them earlier. Or you may have lots of topology exceptions.

I'll trying following instructions in https://github.com/larsop/resolve-overlap-and-gap/issues/3#issuecomment-654740394 to have your same code, but I'd recommend merging changes as soon as they are found useful, rather than continuing to use a branch which was meant to debug invalidities (this ticket is about performance instead).

strk commented 4 years ago

@larsop the instructions in https://github.com/larsop/resolve-overlap-and-gap/issues/3#issuecomment-654740394 did not work for me, I got a failure:

ERROR: function cbg_content_based_balanced_grid(text[], integer) does not exist

I see that function is only defined in src/test/sql/regress/find_overlap_and_gap-pre.sql but in that same file there other other assumptions about database schema which prevent me from sourcing it as-is.

I think the procedure should be greatly simplified, if we want to understand where the time goes. We could start with something like this:

create table sl_esh.ar50_utvikling_flate_lines_simp9 as select sl_sdeid, ST_Simplify((ST_Dump(ST_Boundary(geo))).geom, 9) from sl_esh.ar50_utvikling_flate;

Looking at the result of the above, I see there are two main geographical areas, bound by a square which is divided in 4 cells.

ar50_utvikling_flate_lines_simp9

The region at the bottom-right has these 4 cells further divided by some padding:

ar50_utvikling_flate_lines_simp9-bottom-right-padding

Is there any reason why we're using these 2 regions for testing ? Why are the cell borders part of the input ? My understanding was that the grid was created by the resolve_overlap_and_gap function ?

larsop commented 4 years ago

@larsop the instructions in #3 (comment) did not work for me, I got a failure:

ERROR: function cbg_content_based_balanced_grid(text[], integer) does not exist

I see that function is only defined in src/test/sql/regress/find_overlap_and_gap-pre.sql but in that same file there other other assumptions about database schema which prevent me from sourcing it as-is.

Yes thats correct.

I think the procedure should be greatly simplified, if we want to understand where the time goes. We could start with something like this:

create table sl_esh.ar50_utvikling_flate_lines_simp9 as select sl_sdeid, ST_Simplify((ST_Dump(ST_Boundary(geo))).geom, 9) from sl_esh.ar50_utvikling_flate;

Looking at the result of the above, I see there are two main geographical areas, bound by a square which is divided in 4 cells.

The region at the bottom-right has these 4 cells further divided by some padding:

Yes this dataset generated from a raster and nice thing here is that we have very big polygons with lot os holes, thats makes more complicated to resolve.

Is there any reason why we're using these 2 regions for testing ?

Not really , but we started out with testing area and added a new area and did some more code fixes and then it was nice to have both areas in the same dataset.

With this dataset I was also able make a blocking area that always blocking one of the regions to test updated in parallel.

Why are the cell borders part of the input ? My understanding was that the grid was created by the resolve_overlap_and_gap function ?

The grid is created on the fly yes. Thats just by random that they overlap and it's very nice to have cases like this when doing testing.

larsop commented 4 years ago

but I'd recommend merging changes as soon as they are found useful, rather than continuing to use a branch which was meant to debug invalidities (this ticket is about performance instead).

Thanks I will try to merge the code in master pretty soon when I have done some more testing . I think I then also will merge with the debug options since this is very nice have in many different cases.

larsop commented 4 years ago

I did test by doing adding border edges spatially as you suggested (from east to west) and then I was able run through ar5 in about 3 and half day. I will do some more work to try to run this more efficient is parallel.


SELECT count(*) from test_topo_ar5.edge;
  count   
----------
 19295709
(1 row)

SELECT count(*) from test_topo_ar5.node;
  count   
----------
 13566935
(1 row)

SELECT count(*) from test_topo_ar5.face;
  count  
---------
 7955387
(1 row)

I did see some strange results but that seems to be related to [ST_unon_bug](https://trac.osgeo.org/postgis/ticket/4738)

larsop commented 4 years ago

In this branch I have been able to reduce the time of adding border lines from 48 hours to around 5 hours by not using row level locks but instead using a context based grids on context based grids.

strk commented 4 years ago

Great work!

larsop commented 4 years ago

We got a new database server now with 64 cores and by setting huge mem and this commit and the suggested fix we are now down 12 hours for AR5.

So we now have good performance and we are able to use a lot CPU's in parallel quite efficient.

larsop / resolve-overlap-and-gap

Speed up edge insertions in big topologies #3