ngageoint / mrgeo

MrGeo is a geospatial toolkit designed to provide raster-based geospatial capabilities that can be performed at scale. MrGeo is built upon Apache Spark and the Hadoop ecosystem to leverage the storage and processing of hundreds of commodity computers. See the wiki for more details.
https://github.com/ngageoint/mrgeo/wiki
Apache License 2.0
206 stars 64 forks source link

August 2015 Benchmarks #87

Closed wncampbell closed 8 years ago

wncampbell commented 9 years ago

Benchmark Set 1 Read/Write S3 100 Node m3.xlarge Aster 30

  1. Ingest Image
    • 34m1.743s
    • 29m3.868s
    • 25m52.058s
  2. Build Pyramid
    • 24m36.572s
    • 28m25.603s
    • 28m55.679s
  3. Build Pyramid Categorical
    • non-categorical: 97m20.786s
    • categorical: 121m25.593s
  4. Build Pyramid Rasterized Vector
    • 19m58.019s
  5. Slope (m/r)
    • 660m6.312s
    • 660m13.711s
    • 637m39.714s
  6. Rasterize Vector
    • 36m43.518s
    • 39m1.620s
    • 37m31.167s
  7. Scalar (m/r)
    • 80m12.381s
    • 76m28.571s
    • 76m7.667s
  8. Simple Friction Surface Math (m/r)
    • 81m49.540s
    • 83m2.015s
    • 83m10.618s
  9. Nested Conditional (m/r)
    • 468m24.401s
    • 477m25.898s
  10. Vehicle Friction Surface

Benchmark Set 2 Read/Write S3 100 Node m3.2xlarge Aster 30

  1. Ingest Image
    • 29m59.003s
  2. Build Pyramid
    • 18m12.295s
    • 18m52.738s
    • 18m34.615s
  3. Rasterize Vector
    • 50m44.335s

Benchmark Set 3 Read/Write HDFS 100 Node m3.xlarge ASTER 30

  1. Ingest Image
    • 103m55.789s
  2. Build Pyramid
    • 38m7.328s
  3. Rasterize Vector
    • 24m28.441s

Benchmark Set 4 Read/Write HDFS 75 Node m3.2xlarge ASTER 30

  1. Ingest Image
    • 101m22.904s
  2. Build Pyramid
    • 31m41.023s
  3. Rasterize Vector
    • 23m50.132s

Benchmark Set 5 Read/Write S3 75 Node m3.2xlarge ASTER 30

  1. Ingest Image
    • 30m36.741s
  2. Build Pyramid
    • 25m24.061s
  3. Rasterize Vector
    • 39m36.167s
ttislerdg commented 9 years ago

Here are some benchmarks I collected using different sizes for each worker task:

150 m3.xlarge data nodes (10.8 T tmp space)

running *time mrgeo mapalgebra -o s3://mrgeo/images/aster-30m-slope -e "slope([s3://mrgeo/images/aster-30m])" -v -mm

mm memory/worker no. workers time
1 1024m 1184 (out of memory error during save phase)
1.5 1536m ? Calculation to fit max no. workers ended up with 2048m/ worker, same as below)
2 2048m 750 01:02:58
3 3072m 450 01:42:38
4 4096m 300 01:22:38
drew-bower commented 9 years ago

so it looks like once the ideal amount of RAM per worker it doesn't really matter, with potential variation in performance based on the demand load in EC2?

wncampbell commented 9 years ago

Ingest performance chart: https://live.amcharts.com/Q4MWU/

ttislerdg commented 9 years ago

Do you have the commands you ran for each of these?

wncampbell commented 9 years ago

INGEST mrgeo ingest -nd -9999 -o s3://mrgeo/images/aster-30m-ingest1 -sk -sp -z 12 s3://mrgeo-source/aster-30m

BUILD PYRAMID mrgeo buildpyramid s3://mrgeo/images/aster-30m-ingest1

SLOPE mrgeo mapalgebra -e “result = slope([s3://mrgeo/images/aster-30m])” -o s3://mrgeo/images/slopetest1

RASTERIZE VECTOR mrgeo mapalgebra -e "result = RasterizeVector([s3://mrgeo-source/roadways.tsv],\"LAST\",\"12z\",\"b\")" -o s3://mrgeo/images/kph-roadways

SCALAR OPERATION mrgeo mapalgebra -e "result = [s3://mrgeo/images/aster-30m] + 10" -o s3://mrgeo/images/aster-scale-op

PINGEL RAW OPERATION mrgeo mapalgebra -e 's = [s3://mrgeo/images/aster-30m-slope]; kph = 112 * pow(2.718281828, -8.3 * abs(s)); spm = 3.6 / kph;' -o s3://mrgeo/images/GlobalHumveeRaw

HMMWV WITHOUT LC mrgeo mapalgebra -e "result = con([s3://mrgeo/images/srtm-waterbodies] = 0, 0, [s3://mrgeo/images/roadways] > 0, [s3://mrgeo/images/roadways], [s3://mrgeo/images/GlobalHumveeRaw])" -o s3://mrgeo/images/HMMWV_Friction_NoLC

IMPENDANCE VALUES mrgeo mapalgebra -e "result = con(abs([s3://mrgeo/images/GlobCover30m] - 11) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 14) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 20) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 30) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 40) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 50) < 0.1, 0.2, abs([s3://mrgeo/images/GlobCover30m] - 60) < 0.1, 0.5, abs([s3://mrgeo/images/GlobCover30m] - 70) < 0.1, 0.2, abs([s3://mrgeo/images/GlobCover30m] - 90) < 0.1, 0.3, abs([s3://mrgeo/images/GlobCover30m] - 100) < 0.1, 0.4, abs([s3://mrgeo/images/GlobCover30m] - 110) < 0.1, 0.2, abs([s3://mrgeo/images/GlobCover30m] - 120) < 0.1, 0.7, abs([s3://mrgeo/images/GlobCover30m] - 130) < 0.1, 0.2, abs([s3://mrgeo/images/GlobCover30m] - 140) < 0.1, 0.7, abs([s3://mrgeo/images/GlobCover30m] - 150) < 0.1, 0.8, abs([s3://mrgeo/images/GlobCover30m] - 160) < 0.1, 0.2, abs([s3://mrgeo/images/GlobCover30m] - 170) < 0.1, 0.1, abs([s3://mrgeo/images/GlobCover30m] - 180) < 0.1, 0.1, abs([s3://mrgeo/images/GlobCover30m] - 190) < 0.1, 0.6, abs([s3://mrgeo/images/GlobCover30m] - 200) < 0.1, 1.0, abs([s3://mrgeo/images/GlobCover30m] - 210) < 0.1, 0.0, abs([s3://mrgeo/images/GlobCover30m] - 220) < 0.1, 0.5, abs([s3://mrgeo/images/GlobCover30m] - 230) < 0.1, 1.0, 0.6);" -o s3://mrgeo/images/GlobCoverImpedance

ttislerdg commented 9 years ago

FYI, There are 2,973,491 tiles in the aster-30m dataset. A complete level 12, world-wide raster has 8,388,608 tiles (max). That means aster-30m has 35.4% coverage. i.e. the rest is water.

wncampbell commented 9 years ago

Benchmarks Version 2


_Set A_ Node Type = m3.2xlarge Number Nodes = 75 Total storage = 12TB Total memory = 2.25TB Each step run 1x


_Set B_ Node Type = m3.2xlarge Number Nodes = 125 Total storage = 20TB Total memory = 3.75TB Each step run 1x


_Set C_ Node Type = m3.2xlarge Number Nodes = 175 Total storage = 28TB Total memory = 5.25TB Each step run 1x


_Set D_ Node Type = m3.2xlarge Number Nodes = 75 Total storage = 12TB Total memory = 2.25TB Each step run 1x

wncampbell commented 9 years ago

Memory multiplier configuration Shows mm setting per job for version 2 of the benchmarks. Setting mm below these values resulted in crashes.

75 node Ingest Image - 16 Build Pyramid - 12 Rasterize Vector - 12 Slope - 12 Reclassify - 16 Friction Surface - 16 Friction Surface Scratch - n/a

125 node Ingest Image - 20 Build Pyramid - 12 Rasterize Vector - 12 Slope - 20 Reclassify - 16 Friction Surface - 16

175 node Ingest Image - 24 Build Pyramid - 12 Rasterize Vector - 12 Slope - 20 Reclassify - 16 Friction Surface - 16

wncampbell commented 9 years ago

Graphs for benchmarks v2 (updated as jobs finish)

Data Operations - http://live.amcharts.com/2ZhNj/ MapAlgebra - http://live.amcharts.com/GE2Ym/

wncampbell commented 9 years ago

Using 75 node cluster as the baseline, here's the improvement as nodes are added:

Ingest Build Pyramid Rasterize Vector Slope Reclassify Friction Surface
125 node - 26% -13.54% -0.69% -14.85% -38.26% -34.32%
175 node -35% -22.65% -5.12% -31.04% -53.35% -48.11%
drew-bower commented 9 years ago

Making a lot more sense now. Nice work.