Closed geohacker closed 6 years ago
What happens when you run generate-routable-dumps
in production without national roads included? Does the load reduce to the previous level, where it wasn't a problem at all?
And any idea why the cron isn't acting as expected, and is trying to run the task many times? Would a change in the crontab help? Maybe */120 * * * *
or the like, since before it was working fine with */60 * * * *
.
@mileswwatkins I'm trying to run the way it is now to see how much time the process takes. Next I'll try without national roads.
I think what's happening is the cron inside docker is very unreliable, but I'm not sure if this would improve if we increase the frequency, but I'll try that next.
almost 10 hours after kicking off the generate-routable-dumps.sh ^ - it's processing Thanh Hoa (401) for the longest time, and not done yet. It doesn't look like it's working for us. I'm going to let this run for a bit more to see if it'd finish. Otherwise, we'll have to revert and inspect.
@mileswwatkins thoughts?
Here's the script log:
+ echo Generating routable road dump for province 401
Generating routable road dump for province 401
+ psql postgres://openroads:3lh6VYtR^8Q@or1l9gi23uxkphm.c5vrcunksexw.ap-southeast-1.rds.amazonaws.com:5432/openroads -f export-prepped-from-database.sql -v province_code=401
BEGIN
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
COMMIT
+ wc -l
+ [ 430 -eq 1 ]
+ echo Appending national roads for the province
Appending national roads for the province
+ ../scripts/append-national-roads.js National_network.geojson target-boundary.geojson data/output/Adj_lines.csv
+ echo Making network routable
Making network routable
+ python Network_Clean.py
2018-01-14 20:00:22 -- **Phase 1: Create snapped road network**
2018-01-14 20:00:22 -- Read in Roads file, buffer all lines
2018-01-14 20:00:25 -- join roads into single multipolygon, then find centerline
Importing polygons from: data/output/temp.shp
Calculating centerlines.
@geohacker, what happens when you run it in production without national roads? I don't think this would require reverting the commit, just commenting out the part of the script that appends national roads: https://github.com/orma/openroads-vn-tiler/blob/master/scripts/generate-routable-dumps.sh#L32-L36
Also, it looks like Charles's script is probably single-threaded, so that means that using a big EC2 box probably won't make a big difference.
+ echo Successfully finished
Successfully finished
real 2331m48.720s
user 2331m0.968s
sys 0m17.328s
Successfully finished running after about 38 hours :-/ Just getting back to debugging this. I'll now kick off a run with what @mileswwatkins recommended by taking out the lines that ingests national roads to the Network_Prep script.
When I run the script without national roads:
2018-01-17 17:24:49 -- Resplit roads on new junctions
Traceback (most recent call last):
File "Network_Clean.py", line 211, in <module>
phase_2_final_roads, phase_2_final_junctions = RoadCleanup(iteration_roads, iteration_junctions, 2)
File "Network_Clean.py", line 202, in RoadCleanup
splitted = shapely.ops.split(line, points)
File "/usr/local/lib/python2.7/dist-packages/shapely/ops.py", line 465, in split
raise ValueError("Splitting a LineString with a %s is not supported" % splitter.type)
ValueError: Splitting a LineString with a GeometryCollection is not supported
real 0m22.435s
user 0m12.232s
sys 0m0.764s
It runs faster but breaks on Resplit roads on new junctions
for province 403.
@Charlesfox1 - you mentioned there were a few fixes that may not have been pushed to the Network_Cleaning repo, could this be one of them?
I ran the script on my local docker environment with data from the production database for the Thanh Hoa province without national roads. I wanted to get a benchmark:
That in my mind is a long time to prepare routable network for a province. Once we add more provinces this is going to go up and eventually become really unusable like above. For a full run with national road network it took about 38 hours.
We've identified that this is due to routable dump generation scripts. In the short-term there are no next actions, but in the next phase we'll rewrite these to be more reliable. Ref #185
After the deploy of handling national road network, we noticed the the app goes unresponsive with a steady growth of memory:
@mileswwatkins and I have closed in on the
generate-routable-dumps.sh
script that is causing the issue - it looks like cron spawns multiple of these perhaps they are now long running processes.I'm running them independently to time it before deciding next steps. It looks like to me that the process is running, but just takes a very long time and hence cron malfunctions.
cc @mileswwatkins @Charlesfox1 @sarahantos @olafveerman @kamicut