locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

MAML Integration: combining band-wise operations. #674

Open echeipesh opened 6 years ago

echeipesh commented 6 years ago

One limitation that exists in GeoPySpark API is the inability to express multiple operations on a single tile and combine the result in line. For instance both focalSum and Slope are implemented operations but computing both over TiledRasterLayer would require producing two RDDs and joining them by key.

One way around this limitations is to introduce API like this:

raster_layer = gps.geotiff.get(...)

raster_layer.evalBands( lambda bands: Bands =>
    return [
        bands[0]\
            .slope()\
            .localSum(band[2])\
            .focalSum(n=2)\
            .crop(2)
        bands[1]\
            .slope()
    ]
)

The Bands object only captures the structured of the expression such that it can be interpreted and evaluated in an RDD.mapValues step. MAML is a natural choice for this. We could construct either the JSON or the JVM expression through through the gateway.

class Bands(object):
    def __index__(self, i):
        return Band(src=BandSource(band=i))

class Band(object):
    def _json:
        return """{'source': }"""

    def slope(self):
        return Band(src=self, op="slope")

This becomes relevant to working over multiple raster layers when we have functions to select and combine bands:

# work across two (or more) layers
joined = gps.joinBands(tiled_layer1.bands(1,2), tiled_layer2.bands(3,0), tiled_layer3.bands(0))

joined.evalBands( lambda bands: Bands =>
    return [
        bands[0]\
            .slope()\
            .localSum(band[2])\
            .focalSum(n=2)\
            .crop(2) ,
        bands[1]\
            .slope()
    ] # MAML => f => Array[Tile]
)