locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.33k stars 360 forks source link

TmsLayerReader #2143

Open lossyrob opened 7 years ago

lossyrob commented 7 years ago

There are situations where we want to read files from a TMS-style layer (/{z}/{x}/{y}), e.g. the terrain dataset on aws. We should be able to read a query-able layer reader for layers from that style layout, for anything we can read from binary files.

lossyrob commented 7 years ago

@echeipesh you had brought this idea up earlier; feel free to add additional notes.

lossyrob commented 7 years ago

This might just require adding a query method to our SlippyTileReader set.

echeipesh commented 7 years ago

It seemed like there needs to be a bunch of code handling the actual query but the different backends can be abstracted out by URI => T parameter. This is the sketch have:

import geotrellis.raster._
import java.net.URI

class SlippyReader[T](uriTemplate: String, f: URI => T) extends Serializable {
  def uri(z: Int, x: Int, y: Int): URI = new URI(
    uriTemplate.replace("{z}", z.toString)
      .replace("{x}", x.toString)
      .replace("{y}", y.toString)
  )

  def read(zoom: Int, bounds: GridBounds): Iterator[((Int, Int), T)] = {
    for ((x, y) <- bounds.coords.iterator)
    yield (x, y) -> f(uri(zoom, x, y))
  }

  def read(zoom: Int, x: Int, y: Int): T = f(uri(zoom, x, y))
}

object SlippyReader {
  val TilePath = """.*/(\d+)/(\d+)\.\w+$""".r

  def fromFiles[T](readBytes: Array[Byte] => T): SlippyReader[T] = ???
  def fromURL[T](readBytes: Array[Byte] => T): SlippyReader[T] = ???
}

import org.apache.spark.rdd._
import geotrellis.spark._

class SparkSlippyReader[T](uriTemplate: String, f: URI => T) extends SlippyReader(uriTemplate, f) {
  /** Read a region from a ZXY source into RDD
   * @param zoom Level of zoom to read from
   * @param bounds Query bounds to read
   * @param window Window that will cover partitions, offset from (0,0) of bounds
   */
  def rdd(zoom: Int, bounds: GridBounds, window: GridBounds = GridBounds(0,0,10,10)): RDD[(SpatialKey, T)] = {
    // - divide bound into num "rectangular" partitions
    // - make rdd of bounds
    // - map from bounds to records using f
    ???
  }
}

Couple of intentions here:

I like the query idea a lot, I think it can be plugged in reasonably smoothly.