marinebon / sdg14-shiny

Shiny apps related to MBON SDG14
https://marinebon.github.io/sdg14-shiny
1 stars 0 forks source link

server performance issues #4

Open 7yl4r opened 6 years ago

7yl4r commented 6 years ago

We did a load test of the explorer app on the mbon server today (@ 20:00 UTC) and the few of us had no trouble crashing it. The website became unresponsive and would not reload for several of us (all on USF's network).

CPU, RAM, disk usage, etc were all steady and low % usage throughout the test.

Only outgoing network usage reached heightened levels ( red em1.bytes_sent in the graph below ):

image

In other tests I have found outgoing on em1 to peak around 10Mbps, but I want to confirm that on mbon specifically before ruling that out.

Another possible explanation is that the webserver/database software implementation don't scale well.

7yl4r commented 6 years ago

After a bit more testing I have found that I can lock up the sever for a few minutes with one instance of the application by being unusually click-happy.

Now that I can reproduce it consistently we can continue looking for the source of the bottleneck. Unfortunately I still don't see anything useful in mbon or hydra's metrics.

7yl4r commented 6 years ago

This may be related to the slowness observed when extracting data for regions. Postgres seems to be the common factor here and I need to implement some metrics & do some more testing to figure out why.

7yl4r commented 6 years ago

telegraf postgres plugin should get this done relatively painlessly when combined w/ grafana dash like the mysql one I built before. Do we need all those metrics? :thinking:

7yl4r commented 6 years ago

Ok. Metrics are running.

image

I wasn't able to create a notable blip by using the app. It also seems to be performing better now though. :man_shrugging:

@bbest Let me know next time you do something to stress the server and hopefully we will be better able to diagnose the issue.

7yl4r commented 6 years ago

The following query hangs over 10min. Expected to return ~2M records:

SELECT * FROM obis WHERE eez_mrgid = 8763

Query started at ~16:42 in following images:

image

image

Disk usage appears to be the bottleneck. Also note the jump in swap usage. Swappiness should be 0, so setting that manually may help significantly.

ty@mbon:~$ cat /proc/sys/vm/swappiness
60
ty@mbon:~$ sudo sysctl vm.swappiness=0
vm.swappiness = 0
ty@mbon:~$ cat /proc/sys/vm/swappiness
0

That should be better. Although I think the postgres data lives on a different disk (and that is the one maxing out) than hydra should be using for swap so perhaps this won't help. Certainly won't hurt though.

7yl4r commented 6 years ago

Checking postgres config with postgresqltuner I see a number of issues that could be fixed to improve performance. Especially those balancing between disk & ram usage. Max postgres RAM usage is currently at ~2% of capacity, where closer to 60% is optimal.

@bbest : So I want to modify the postgis configuration at /etc/postgresql/9.5/main/postgresql.conf in the docker instance, but changes I make there will get blown away if the container restarts (right?). So I need to clone kartoza/docker-postgis, modify the postgresql.conf, and replace the container we are using with a new one based on my modified repo?

bbest commented 6 years ago

Good find @7yl4r!

I think you can simply place the edited postgresql.conf into /mnt/mbon-supplement which gets mapped to /mbon-local by the docker postgis container per https://github.com/marinebon/sdg14/issues/4. Then run:

# copy postgis config allocating more RAM
docker exec -postgis bash -c \
  cp -f /mbon-local/postgresql.conf /etc/postgresql/9.5/main/postgresql.conf; \
  /etc/init.d/postgresql reload'

I've been keeping track of these docker tweaks in https://github.com/marinebon/sdg14/issues/4, but should probably maintain a sequential authoritative script like:

https://github.com/marinebon/sdg14/blob/gh-pages/technical/mbon-in-box_docker.sh

bbest commented 6 years ago

@7yl4r, have you been able to try out this new postgresql.conf? Wasn't sure what settings to change and would love to play with intersections using PostGIS this week. We can also discuss tomorrow noon EDT / 9am PDT.

7yl4r commented 6 years ago

I have the conf file in place, but cannot get postgresql to restart or reload the conf. I have tried a dozen ways of accomplishing this, and they all seem to either have no effect or else the container dies when trying them. We may need to find another way of getting the config in there.

bbest commented 6 years ago

The biggest obstacle to extraction for regions of interest (ROIs), OBIS querying and marine biodiversity indicator development is the slowness in our PostGIS database. The simplest imaginable query (SELECT * FROM obis WHERE eez_mrgid = 8763) currently takes 10 minutes to complete, and 1 minute once cached. More complex queries such as intersecting vector with raster (see Tutorial), don't seem to complete after days. Tylar has been installing diagnostic software, tweaking Postgres paramaters and evaluating the problem in this issue marinebon/sdg14-shiny#4 and SDG14 Technical Mtg Notes. We've concluded that it seems to be a disk I/O issue, so propose to buy a fast solid state drive (SSD) and directly mount it (as RAW format vs QCOW2 virtualization). This is mission critical enough that I'm fine with spending up to $2K of monies allocated to me this year to do this, starting with a fast 1TB SSD to be placed into the existing server RAID (and possible expansion to ~ 4TB in future).

Can we please get a 1TB solid state drive ordered ASAP?

7yl4r commented 6 years ago

I want to get a good baseline reading of disk read maxed out as is to inform our purchase. I tried a few little test queries and can't max it out now that we have adjusted postgis settings. I am looking for one of those intersect queries like in the tutorial we talked about before, but can't find them.

bbest commented 6 years ago

I don't have rasters loaded into the database yet, but you could try a point in polygon operation, such as this to update the OBIS points with their EEZ identity:

\timing
UPDATE obis_occ SET eez_territory = NULL;  
UPDATE obis_occ AS o SET eez_territory = t.eez_territory
  FROM (
    SELECT Territory1 AS obis_pkey, eez_territory
    FROM obis_occ AS o, eez AS e
    WHERE ST_INTERSECTS(o.geom, e.geom)) AS t
  WHERE o.obis_pkey = t.obis_pkey;
7yl4r commented 6 years ago

Ah, got it:

mbon_gis=# 
SELECT territory1                            
    FROM obis AS o, eez AS e
    WHERE ST_INTERSECTS(o.geom, e.geom)
;

Bad news though: this join is not disk-limited. To me it looks like the query is not multithreaded so we're limited by the clock on one CPU.

image

A faster disk will still help, but I think not as much as we were expecting before. I see a lot of discussion online regarding the slowness of joins like this. There are some good idea for modifying the query to speed it up (listed below), but this kind of operation might take tens of minutes even with optimization & better hardware.

  1. partial checks first
  2. using a spatial index
  3. shortcutting w/ st_within
  4. parallel query plans (in v9.6+)
  5. split query using "parsel"
bbest commented 6 years ago

Hmm... Ok, good investigations! Looks like we have PostgreSQL 9.5.7 based on SELECT version();, so cannot distribute parallel query plans across CPUs. Looks like the kartoza/docker-postgis docker image has been updated to Postgres 10, based on Dockerfile#L18.

I'm gonna try upgrading Postgres / PostGIS to the latest, try out some more spatial queries while using top, investigate query plans with EXPLAIN() and consider this official Docker alternative for Postgres and PostGIS:

PS On mbon.marine.usf.edu, using PostgreSQL 9.5.7:

SELECT PostGIS_Full_Version();

PPS More useful links:

7yl4r commented 6 years ago

Sounds like a good plan!

Here is a bit more info I collected to confirm why I am thinking it is not disk limited:

The "disk_utilization" metric (which tracks what % of each second the disk is doing something) was quite low except a few spikes.

I left that query running and disk read speed was hanging out below 400kB/s nearly the whole time. Based on another test (below) we should expect that to be in the 100s of MB/s if the disk is topping out.

me@mbon:/mbon$ sudo hdparm -Tt /dev/vdb
/dev/vdb:
 Timing cached reads:   21688 MB in  2.00 seconds = 10855.59 MB/sec
 Timing buffered disk reads: 872 MB in  3.00 seconds = 290.59 MB/sec

Good luck, I will keep my fingers crossed for you this weekend. :crossed_fingers:

7yl4r commented 6 years ago

I don't really expect that more cache will help this, but it looks like I can push this server from ~40GB RAM up to ~250GB. Might be worth trying I guess. Need to restart the server to do it. Just let me know when is a good time.

bbest commented 6 years ago

Whoah, that's a lot of memory! Ok, please feel free to restart.

I've been playing around with alternate versions (mdillon/postgis:10-alpine with the slim Alpine ubuntu base using https://docs.docker.com/samples/library/postgres/) on my local Mac laptop Docker, but getting this error with the likes of ST_Intersection(geom, rast) intersecting polygons with rasters:

Error in postgresqlExecStatement(conn, statement, ...) :
   RS-DBI driver: (could not Retrieve the result : ERROR:  GEOSIntersects: TopologyException: side location conflict at -125.96666667939998 48.100000004299986

Apparently this is a known bug https://trac.osgeo.org/postgis/ticket/3984 & https://github.com/pramsey/postgis-gh/issues/3651 introduced with a recent version of GEOS in PostGIS.

Gonnna try again with mdillon/postgis:9.6-alpine

7yl4r commented 6 years ago

Just rebooted. Now have 232.78GB allocated.

bbest commented 6 years ago

SELECT territory1
FROM obis AS o, eez AS e WHERE ST_INTERSECTS(o.geom, e.geom) AND eez.territory1 = "Albania";

bbest commented 6 years ago

upgrade postgis to 9.6-2.4

docker rename postgis postgis-9.5.7-2.3.2
docker stop postgis-9.5.7-2.3.2
docker run --name "postgis" \
  --restart unless-stopped \
  -p 5432:5432 \
  -v /mbon:/mbon \
  -v /mnt/mbon-supplement:/mbon-local \
  -v /mnt/mbon-supplement/postgresql:/var/lib/postgresql \
  -d -t kartoza/postgis:9.6-2.4
docker ps -a

Then run https://github.com/marinebon/sdg14/blob/gh-pages/technical/docker/postgis.sh

# copy postgis config allocating more RAM
docker exec postgis cp -f /mbon-local/postgresql.conf /etc/postgresql/9.5/.

# restart postgres to load config changes
docker exec postgis service postgresql restart

Then move

docker exec postgis service postgresql stop

# ben@mbon
cd /mnt/mbon-supplement/postgresql
sudo cp -rp 9.5 9.6

# root@postgis
docker exec -it postgis bash
chown -R postgres /var/lib/postgresql/9.6/main/* 
chgrp -R postgres /var/lib/postgresql/9.6/main
service postgresql start

TODO: upgrade Postgres-PostGIS 9.5.7-2.3.2 to Postgres-PostGIS 9.6-2.4

[....] Starting PostgreSQL 9.6 database server: main
[....] Removed stale pid file. 
The PostgreSQL server failed to start. Please check the log output: 
2018-07-24 21:13:17.290 UTC [1074] FATAL: database 
  [FAIL are incompatible with server 2018-07-24 21:13:17.290 UTC [1074] 
   DETAIL: The data directory was initialized by PostgreSQL version 9.5, 
                  which is not compatible with this version 9.6.9. ... failed!
failed!

Then soft upgrade of PostGIS.

ALTER EXTENSION postgis UPDATE TO "2.4";
ALTER EXTENSION postgis_topology UPDATE TO "2.4";

docker volumes

# create volume
docker volume create mbon-db

docker run --name "postgis" \
  #...
  #-v /mnt/mbon-supplement/postgresql:/var/lib/postgresql \
  -v mbon-db:/var/lib/postgresql \
  -d -t kartoza/postgis:9.6-2.4
bbest commented 6 years ago

Failing on ST_Intersects()

SELECT pkey, ST_Intersection(rast, geom) AS geom
  FROM r_depth_west, boem_lease_blocks
  WHERE ST_Intersects(rast, ST_makeValid(geom))
# NOTICE:  No pixels found for band 1
# CONTEXT:  SQL function "st_pixelaspolygons" statement 1
# SQL statement "SELECT public.ST_Collect(t.geom)            FROM public.ST_PixelAsPolygons(rast, nband) AS t"
# PL/pgSQL function _st_intersects(geometry,raster,integer) line 21 at SQL statement
# PL/pgSQL function st_intersection(geometry,raster,integer) line 5 at assignment
# SQL function "st_intersection" statement 1
# ... repeate 2x above...
# Error in postgresqlExecStatement(conn, statement, ...) :
#   RS-DBI driver: (could not Retrieve the result : ERROR:  GEOSIntersects: TopologyException: side location conflict at -123.75000002159996 32.166666734699888

So a fundamental defect currently with GEOS performing ST_Intersection():

using:

Working on AVG

SELECT pkey, AVG((ST_SummaryStats(ST_CLIP(rast, 1, geom, true))).mean)
   FROM r_depth_west, boem_lease_blocks
   WHERE ST_Intersects(geom, rast)
   GROUP BY pkey;

OK, this works, BUT we don't get the actual intersection of a raster cell with the polygon, so don't know the area overlap which negates the possibility of doing an area-weighted average of cells within a polygon -- doh!

24 sec on BB's Mac Docker kartoza/postgis

A fresh approach

So I believe under the hood, ST_Intersection(raster, geometry) dumps the raster cells as a set of polygons (ie ST_DumpAsPolygons) before performing a polygon vs polygon ST_Intersection(geometry, geometry).

  1. For each raster type, load a reference grid ref_grid into PostGIS, eg global4km for SST or Chl using raster::rasterToPolygons()
  2. For each raster, load rasters into table with cell id (cid) and parameter (eg "sst"): ref_grid | cid | parameter | date | value
  3. In the app, for an arbitrary polygon poly, run ST_Intersection(poly, ref_grid) and rest of the SQL with area weighted summary stats to return along with efficiencies detailed in PostGIS — Getting intersections the faster way. Could do this with stats like min, mean, max, sd etc.
  4. Create API for querying curated rasters/features with polygons that are either predefined (eg EEZ, MPA, MEOW, etc) or arbitrarily defined. See R package plumber and webinar Plumbing APIs with plumber - 2018-07 with slides