ngageoint / mrgeo

MrGeo is a geospatial toolkit designed to provide raster-based geospatial capabilities that can be performed at scale. MrGeo is built upon Apache Spark and the Hadoop ecosystem to leverage the storage and processing of hundreds of commodity computers. See the wiki for more details.
https://github.com/ngageoint/mrgeo/wiki
Apache License 2.0
207 stars 64 forks source link

Tile boundary artifacts after ingest of SRTM data #53

Closed wncampbell closed 9 years ago

wncampbell commented 9 years ago

Seeing missing data for every tile boundary in the ingest, screenshots below. To reproduce:

mrgeo ingest -o s3://mrgeo/images/srtm-elevation -sk -sp -z 10 -v s3://mrgeo-source/srtm-90 mrgeo buildpyramid s3://mrgeo/images/srtm-elevation

500 m3.xlarge Ingest - 9:30.51elapsed BuildPyramid - 11:16.55elapsed

srtm2

srtm1

srtm3

wncampbell commented 9 years ago

Adding the nodata flag for the correct srtm nodata value did not make a difference, output is the same.

ttislerdg commented 9 years ago

Can you tell if the artifact is nodata, or another value?

My suspicion is the artifacts are not a nodata value, but instead another (0 maybe) The crenelation is from the edge of an input file is crossing tile boundaries. We should be replacing any nodatas with actual values in a reduceByKey() call. If nodata isn't set correctly, that artifact can happen.

I'll grab 4 tiles from s3 and run them locally and see what happens.

wncampbell commented 9 years ago

The artifacts are nodata. If I do not specify a nodata value on ingest, the pixel value for the artifacts is -32768 (nodata value of the source data). When I ingest using the nodata flag set to this value, the artifacts are nodata. I verified by exporting geotiffs at their native resolution.

wncampbell commented 9 years ago

Verified working, looks great.

validsrtm

Btw, using latest release the ingest took 5min and pyramids took 5min on 100 nodes.

drew-bower commented 9 years ago

Can you re-run the ingest on 20 nodes? I'd like to mentally compare what we did back when on the 20 node.

This is pretty fast, into S3 to boot.