Open 7yl4r opened 7 years ago
After a bit more testing I have found that I can lock up the sever for a few minutes with one instance of the application by being unusually click-happy.
Now that I can reproduce it consistently we can continue looking for the source of the bottleneck. Unfortunately I still don't see anything useful in mbon or hydra's metrics.
This may be related to the slowness observed when extracting data for regions. Postgres seems to be the common factor here and I need to implement some metrics & do some more testing to figure out why.
telegraf postgres plugin should get this done relatively painlessly when combined w/ grafana dash like the mysql one I built before. Do we need all those metrics? :thinking:
Ok. Metrics are running.
I wasn't able to create a notable blip by using the app. It also seems to be performing better now though. :man_shrugging:
@bbest Let me know next time you do something to stress the server and hopefully we will be better able to diagnose the issue.
The following query hangs over 10min. Expected to return ~2M records:
SELECT * FROM obis WHERE eez_mrgid = 8763
Query started at ~16:42 in following images:
Disk usage appears to be the bottleneck. Also note the jump in swap usage. Swappiness should be 0, so setting that manually may help significantly.
ty@mbon:~$ cat /proc/sys/vm/swappiness
60
ty@mbon:~$ sudo sysctl vm.swappiness=0
vm.swappiness = 0
ty@mbon:~$ cat /proc/sys/vm/swappiness
0
That should be better. Although I think the postgres data lives on a different disk (and that is the one maxing out) than hydra should be using for swap so perhaps this won't help. Certainly won't hurt though.
Checking postgres config with postgresqltuner I see a number of issues that could be fixed to improve performance. Especially those balancing between disk & ram usage. Max postgres RAM usage is currently at ~2% of capacity, where closer to 60% is optimal.
@bbest : So I want to modify the postgis configuration at /etc/postgresql/9.5/main/postgresql.conf
in the docker instance, but changes I make there will get blown away if the container restarts (right?). So I need to clone kartoza/docker-postgis, modify the postgresql.conf, and replace the container we are using with a new one based on my modified repo?
Good find @7yl4r!
I think you can simply place the edited postgresql.conf
into /mnt/mbon-supplement
which gets mapped to /mbon-local
by the docker postgis container per https://github.com/marinebon/sdg14/issues/4. Then run:
# copy postgis config allocating more RAM
docker exec -postgis bash -c \
cp -f /mbon-local/postgresql.conf /etc/postgresql/9.5/main/postgresql.conf; \
/etc/init.d/postgresql reload'
I've been keeping track of these docker tweaks in https://github.com/marinebon/sdg14/issues/4, but should probably maintain a sequential authoritative script like:
https://github.com/marinebon/sdg14/blob/gh-pages/technical/mbon-in-box_docker.sh
@7yl4r, have you been able to try out this new postgresql.conf
? Wasn't sure what settings to change and would love to play with intersections using PostGIS this week. We can also discuss tomorrow noon EDT / 9am PDT.
I have the conf file in place, but cannot get postgresql to restart or reload the conf. I have tried a dozen ways of accomplishing this, and they all seem to either have no effect or else the container dies when trying them. We may need to find another way of getting the config in there.
The biggest obstacle to extraction for regions of interest (ROIs), OBIS querying and marine biodiversity indicator development is the slowness in our PostGIS database. The simplest imaginable query (SELECT * FROM obis WHERE eez_mrgid = 8763) currently takes 10 minutes to complete, and 1 minute once cached. More complex queries such as intersecting vector with raster (see Tutorial), don't seem to complete after days. Tylar has been installing diagnostic software, tweaking Postgres paramaters and evaluating the problem in this issue marinebon/sdg14-shiny#4 and SDG14 Technical Mtg Notes. We've concluded that it seems to be a disk I/O issue, so propose to buy a fast solid state drive (SSD) and directly mount it (as RAW format vs QCOW2 virtualization). This is mission critical enough that I'm fine with spending up to $2K of monies allocated to me this year to do this, starting with a fast 1TB SSD to be placed into the existing server RAID (and possible expansion to ~ 4TB in future).
Can we please get a 1TB solid state drive ordered ASAP?
I want to get a good baseline reading of disk read maxed out as is to inform our purchase. I tried a few little test queries and can't max it out now that we have adjusted postgis settings. I am looking for one of those intersect queries like in the tutorial we talked about before, but can't find them.
I don't have rasters loaded into the database yet, but you could try a point in polygon operation, such as this to update the OBIS points with their EEZ identity:
\timing
UPDATE obis_occ SET eez_territory = NULL;
UPDATE obis_occ AS o SET eez_territory = t.eez_territory
FROM (
SELECT Territory1 AS obis_pkey, eez_territory
FROM obis_occ AS o, eez AS e
WHERE ST_INTERSECTS(o.geom, e.geom)) AS t
WHERE o.obis_pkey = t.obis_pkey;
Ah, got it:
mbon_gis=#
SELECT territory1
FROM obis AS o, eez AS e
WHERE ST_INTERSECTS(o.geom, e.geom)
;
Bad news though: this join is not disk-limited. To me it looks like the query is not multithreaded so we're limited by the clock on one CPU.
A faster disk will still help, but I think not as much as we were expecting before. I see a lot of discussion online regarding the slowness of joins like this. There are some good idea for modifying the query to speed it up (listed below), but this kind of operation might take tens of minutes even with optimization & better hardware.
Hmm... Ok, good investigations! Looks like we have PostgreSQL 9.5.7 based on SELECT version();
, so cannot distribute parallel query plans across CPUs. Looks like the kartoza/docker-postgis docker image has been updated to Postgres 10, based on Dockerfile#L18.
PostgreSQL: Documentation: 10: 15.2. When Can Parallel Query Be Used?
In order for any parallel query plans whatsoever to be generated, the following settings must be configured as indicated.
max_parallel_workers_per_gather
must be set to a value which is greater than zero. This is a special case of the more general principle that no more workers should be used than the number configured via max_parallel_workers_per_gather.
dynamic_shared_memory_type
must be set to a value other than none. Parallel query requires dynamic shared memory in order to pass data between cooperating processes.
In addition, the system must not be running in single-user mode.
Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:
The query writes any data or locks any database rows.
...
For instance, check out this docker image shongololo/postgis - Postgres 10 with PostGIS 2.4, SFCGAL, and pgrouting 2.5, plus raster and SSL support:
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
It is worth considering updating these parameters if you are using an SSD drive or large amounts of RAM. For customisation, connect as the postgres superuser then use the ALTER SYSTEM command to update the desired configuration settings, then restart the database, for example:
SET ROLE postgres;
ALTER SYSTEM SET max_connections = '100';
RESET ROLE;
SELECT pg_reload_conf();
I'm gonna try upgrading Postgres / PostGIS to the latest, try out some more spatial queries while using top
, investigate query plans with EXPLAIN()
and consider this official Docker alternative for Postgres and PostGIS:
PS On mbon.marine.usf.edu, using PostgreSQL 9.5.7:
SELECT PostGIS_Full_Version();
PPS More useful links:
Sounds like a good plan!
Here is a bit more info I collected to confirm why I am thinking it is not disk limited:
The "disk_utilization" metric (which tracks what % of each second the disk is doing something) was quite low except a few spikes.
I left that query running and disk read speed was hanging out below 400kB/s nearly the whole time. Based on another test (below) we should expect that to be in the 100s of MB/s if the disk is topping out.
me@mbon:/mbon$ sudo hdparm -Tt /dev/vdb
/dev/vdb:
Timing cached reads: 21688 MB in 2.00 seconds = 10855.59 MB/sec
Timing buffered disk reads: 872 MB in 3.00 seconds = 290.59 MB/sec
Good luck, I will keep my fingers crossed for you this weekend. :crossed_fingers:
I don't really expect that more cache will help this, but it looks like I can push this server from ~40GB RAM up to ~250GB. Might be worth trying I guess. Need to restart the server to do it. Just let me know when is a good time.
Whoah, that's a lot of memory! Ok, please feel free to restart.
I've been playing around with alternate versions (mdillon/postgis:10-alpine
with the slim Alpine ubuntu base using https://docs.docker.com/samples/library/postgres/) on my local Mac laptop Docker, but getting this error with the likes of ST_Intersection(geom, rast)
intersecting polygons with rasters:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: GEOSIntersects: TopologyException: side location conflict at -125.96666667939998 48.100000004299986
Apparently this is a known bug https://trac.osgeo.org/postgis/ticket/3984 & https://github.com/pramsey/postgis-gh/issues/3651 introduced with a recent version of GEOS in PostGIS.
Gonnna try again with mdillon/postgis:9.6-alpine
Just rebooted. Now have 232.78GB
allocated.
SELECT territory1
FROM obis AS o, eez AS e
WHERE ST_INTERSECTS(o.geom, e.geom) AND eez.territory1 = "Albania";
docker rename postgis postgis-9.5.7-2.3.2
docker stop postgis-9.5.7-2.3.2
docker run --name "postgis" \
--restart unless-stopped \
-p 5432:5432 \
-v /mbon:/mbon \
-v /mnt/mbon-supplement:/mbon-local \
-v /mnt/mbon-supplement/postgresql:/var/lib/postgresql \
-d -t kartoza/postgis:9.6-2.4
docker ps -a
Then run https://github.com/marinebon/sdg14/blob/gh-pages/technical/docker/postgis.sh
# copy postgis config allocating more RAM
docker exec postgis cp -f /mbon-local/postgresql.conf /etc/postgresql/9.5/.
# restart postgres to load config changes
docker exec postgis service postgresql restart
Then move
docker exec postgis service postgresql stop
# ben@mbon
cd /mnt/mbon-supplement/postgresql
sudo cp -rp 9.5 9.6
# root@postgis
docker exec -it postgis bash
chown -R postgres /var/lib/postgresql/9.6/main/*
chgrp -R postgres /var/lib/postgresql/9.6/main
service postgresql start
[....] Starting PostgreSQL 9.6 database server: main
[....] Removed stale pid file.
The PostgreSQL server failed to start. Please check the log output:
2018-07-24 21:13:17.290 UTC [1074] FATAL: database
[FAIL are incompatible with server 2018-07-24 21:13:17.290 UTC [1074]
DETAIL: The data directory was initialized by PostgreSQL version 9.5,
which is not compatible with this version 9.6.9. ... failed!
failed!
Then soft upgrade of PostGIS.
ALTER EXTENSION postgis UPDATE TO "2.4";
ALTER EXTENSION postgis_topology UPDATE TO "2.4";
# create volume
docker volume create mbon-db
docker run --name "postgis" \
#...
#-v /mnt/mbon-supplement/postgresql:/var/lib/postgresql \
-v mbon-db:/var/lib/postgresql \
-d -t kartoza/postgis:9.6-2.4
SELECT pkey, ST_Intersection(rast, geom) AS geom
FROM r_depth_west, boem_lease_blocks
WHERE ST_Intersects(rast, ST_makeValid(geom))
# NOTICE: No pixels found for band 1
# CONTEXT: SQL function "st_pixelaspolygons" statement 1
# SQL statement "SELECT public.ST_Collect(t.geom) FROM public.ST_PixelAsPolygons(rast, nband) AS t"
# PL/pgSQL function _st_intersects(geometry,raster,integer) line 21 at SQL statement
# PL/pgSQL function st_intersection(geometry,raster,integer) line 5 at assignment
# SQL function "st_intersection" statement 1
# ... repeate 2x above...
# Error in postgresqlExecStatement(conn, statement, ...) :
# RS-DBI driver: (could not Retrieve the result : ERROR: GEOSIntersects: TopologyException: side location conflict at -123.75000002159996 32.166666734699888
So a fundamental defect currently with GEOS performing ST_Intersection()
:
using:
SELECT pkey, AVG((ST_SummaryStats(ST_CLIP(rast, 1, geom, true))).mean)
FROM r_depth_west, boem_lease_blocks
WHERE ST_Intersects(geom, rast)
GROUP BY pkey;
OK, this works, BUT we don't get the actual intersection of a raster cell with the polygon, so don't know the area overlap which negates the possibility of doing an area-weighted average of cells within a polygon -- doh!
24 sec on BB's Mac Docker kartoza/postgis
So I believe under the hood, ST_Intersection(raster, geometry)
dumps the raster cells as a set of polygons (ie ST_DumpAsPolygons
) before performing a polygon vs polygon ST_Intersection(geometry, geometry)
.
ref_grid
into PostGIS, eg global4km
for SST or Chl using raster::rasterToPolygons()
ref_grid
| cid
| parameter
| date
| value
poly
, run ST_Intersection(poly, ref_grid)
and rest of the SQL with area weighted summary stats to return along with efficiencies detailed in PostGIS — Getting intersections the faster way. Could do this with stats like min, mean, max, sd etc.
We did a load test of the explorer app on the mbon server today (@ 20:00 UTC) and the few of us had no trouble crashing it. The website became unresponsive and would not reload for several of us (all on USF's network).
CPU, RAM, disk usage, etc were all steady and low % usage throughout the test.
Only outgoing network usage reached heightened levels ( red
em1.bytes_sent
in the graph below ):In other tests I have found outgoing on em1 to peak around 10Mbps, but I want to confirm that on mbon specifically before ruling that out.
Another possible explanation is that the webserver/database software implementation don't scale well.