orma / openroads-vn-analytics

BSD 2-Clause "Simplified" License
0 stars 1 forks source link

Exceeding memory and CPU reservations on ECS #360

Closed geohacker closed 6 years ago

geohacker commented 6 years ago

After the deploy of handling national road network, we noticed the the app goes unresponsive with a steady growth of memory:

image

@mileswwatkins and I have closed in on the generate-routable-dumps.sh script that is causing the issue - it looks like cron spawns multiple of these perhaps they are now long running processes.

I'm running them independently to time it before deciding next steps. It looks like to me that the process is running, but just takes a very long time and hence cron malfunctions.

cc @mileswwatkins @Charlesfox1 @sarahantos @olafveerman @kamicut

mileswwatkins commented 6 years ago

What happens when you run generate-routable-dumps in production without national roads included? Does the load reduce to the previous level, where it wasn't a problem at all?

And any idea why the cron isn't acting as expected, and is trying to run the task many times? Would a change in the crontab help? Maybe */120 * * * * or the like, since before it was working fine with */60 * * * *.

geohacker commented 6 years ago

@mileswwatkins I'm trying to run the way it is now to see how much time the process takes. Next I'll try without national roads.

I think what's happening is the cron inside docker is very unreliable, but I'm not sure if this would improve if we increase the frequency, but I'll try that next.

geohacker commented 6 years ago

image

almost 10 hours after kicking off the generate-routable-dumps.sh ^ - it's processing Thanh Hoa (401) for the longest time, and not done yet. It doesn't look like it's working for us. I'm going to let this run for a bit more to see if it'd finish. Otherwise, we'll have to revert and inspect.

@mileswwatkins thoughts?

geohacker commented 6 years ago

Here's the script log:

+ echo Generating routable road dump for province 401
Generating routable road dump for province 401
+ psql postgres://openroads:3lh6VYtR^8Q@or1l9gi23uxkphm.c5vrcunksexw.ap-southeast-1.rds.amazonaws.com:5432/openroads -f export-prepped-from-database.sql -v province_code=401
BEGIN
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
COMMIT
+ wc -l
+ [ 430 -eq 1 ]
+ echo Appending national roads for the province
Appending national roads for the province
+ ../scripts/append-national-roads.js National_network.geojson target-boundary.geojson data/output/Adj_lines.csv
+ echo Making network routable
Making network routable
+ python Network_Clean.py

2018-01-14 20:00:22 -- **Phase 1: Create snapped road network**

2018-01-14 20:00:22 -- Read in Roads file, buffer all lines

2018-01-14 20:00:25 -- join roads into single multipolygon, then find centerline
Importing polygons from: data/output/temp.shp
Calculating centerlines.
mileswwatkins commented 6 years ago

@geohacker, what happens when you run it in production without national roads? I don't think this would require reverting the commit, just commenting out the part of the script that appends national roads: https://github.com/orma/openroads-vn-tiler/blob/master/scripts/generate-routable-dumps.sh#L32-L36

Also, it looks like Charles's script is probably single-threaded, so that means that using a big EC2 box probably won't make a big difference.

geohacker commented 6 years ago
+ echo Successfully finished
Successfully finished

real    2331m48.720s
user    2331m0.968s
sys     0m17.328s

Successfully finished running after about 38 hours :-/ Just getting back to debugging this. I'll now kick off a run with what @mileswwatkins recommended by taking out the lines that ingests national roads to the Network_Prep script.

geohacker commented 6 years ago

When I run the script without national roads:

2018-01-17 17:24:49 --     Resplit roads on new junctions
Traceback (most recent call last):
  File "Network_Clean.py", line 211, in <module>
    phase_2_final_roads, phase_2_final_junctions = RoadCleanup(iteration_roads, iteration_junctions, 2)
  File "Network_Clean.py", line 202, in RoadCleanup
    splitted = shapely.ops.split(line, points)
  File "/usr/local/lib/python2.7/dist-packages/shapely/ops.py", line 465, in split
    raise ValueError("Splitting a LineString with a %s is not supported" % splitter.type)
ValueError: Splitting a LineString with a GeometryCollection is not supported

real    0m22.435s
user    0m12.232s
sys     0m0.764s

It runs faster but breaks on Resplit roads on new junctions for province 403.

@Charlesfox1 - you mentioned there were a few fixes that may not have been pushed to the Network_Cleaning repo, could this be one of them?

geohacker commented 6 years ago

I ran the script on my local docker environment with data from the production database for the Thanh Hoa province without national roads. I wanted to get a benchmark:

That in my mind is a long time to prepare routable network for a province. Once we add more provinces this is going to go up and eventually become really unusable like above. For a full run with national road network it took about 38 hours.

geohacker commented 6 years ago

We've identified that this is due to routable dump generation scripts. In the short-term there are no next actions, but in the next phase we'll rewrite these to be more reliable. Ref #185