locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.33k stars 362 forks source link

Read raster metadata and filter by it before read #2699

Open echeipesh opened 6 years ago

echeipesh commented 6 years ago

Connects: https://github.com/locationtech/geotrellis/issues/2698

This is the most direct use case that separates decision of whether to use a raster from actually reading the raster cells into spark memory.

  1. List all GeoTiff files in a s3:// URI
  2. Read raster metadata into case classes that describe the files
  3. Filter metadata based on some geometry (ex: boundary of a state)
  4. Repartition and Break up each raster into sets of windows
  5. Read windows in parallel
metasim commented 6 years ago

I'd like to add this to RasterFrame's GeoTrellisRelation as soon as its available. We have this use case all the time.