locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.33k stars 360 forks source link

OSM => VectorTiles :: (6) Protobuf file output #1662

Closed fosskers closed 7 years ago

fosskers commented 7 years ago

This is Part 6 of a series of issues documenting the process of creating a world's worth of VectorTiles from OSM Planet data. Please use these issues to discuss solutions.

RDD[(SpatialKey, VectorTile)] can already be read and written between our supported backends, but as their protobuf bytes further serialized by Avro. For use with external VectorTile tile servers, VTs need to be written out en-masse in some predictable naming scheme.

Question: How should the final output protobuf files be named?

fosskers commented 7 years ago

z-x-y.mvt all dumped into a single directory, or split into a directory hierarchy a la z/x/y/tile.mvt?

fosskers commented 7 years ago

The mechanics may already be in place: https://github.com/geotrellis/geotrellis/blob/master/spark/src/main/scala/geotrellis/spark/io/hadoop/package.scala#L66

fosskers commented 7 years ago

They were:

         println("Saving VectorTiles to filesystem...")

         val rdd1: RDD[(SpatialKey, Array[Byte])] = rdd0.mapValues({
           case v: ProtobufTile => v.toBytes
           case _ => throw new IllegalArgumentException("Expected a ProtobufTile")
         })

         /* Setup for saving to the file system */
         val template = s"/home/colin/vt-cache/catalog/{name}/{z}/{x}/{y}.mvt"
         val id = LayerId("sample", 1)

         val keyToPath: (SpatialKey) => String =
           SaveToHadoop.spatialKeyToPath(id, template)

         val wrote: Long = rdd1.saveToHadoop(keyToPath)

         println(s"Wrote ${wrote} VectorTiles")
fosskers commented 7 years ago

Full demo here: https://github.com/fosskers/vectortile-io

With current GeoTrellis we can already save arbitrary RDD[K, Array[Byte]] to either the filesystem of S3. It's just a matter of calling the appropriate functions.

fosskers commented 7 years ago

Extra notes. writeToHadoop and writeToS3 are injected methods on RDD[(K, Array[Byte])], and require import geotrellis.spark.io.hadoop._ and import geotrellis.spark.io.s3._ respectively.