Open jsmoreau opened 5 years ago
Thank you so much for looking into it and creating this issue!
@jsmoreau hey, can u provide the image ingest code using your own zoomlayoutscheme?
After I define a customzoomlayoutscheme like yours which changing the crs.worldExtent
, my spark job falls into infinite loop.
my code:
val inputRdd: RDD[(ProjectedExtent, MultibandTile)] =
sc.hadoopMultibandGeoTiffRDD(inputPath).mapValues(m => m.withNoData(Option(0)))
val (_, rasterMetaData) = CollectTileLayerMetadata.fromRDD(inputRdd,FloatingLayoutScheme(512))
val tiled: RDD[(SpatialKey, MultibandTile)] =
inputRdd.tileToLayout(rasterMetaData.cellType, rasterMetaData.layout, Bilinear).repartition(200)
val layoutScheme = CustomZoomedLayoutScheme(CRS.fromEpsgCode(2438), tileSize = 256)
val (zoom, reprojected): (Int, RDD[(SpatialKey, MultibandTile)] with Metadata[TileLayerMetadata[SpatialKey]]) =
MultibandTileLayerRDD(tiled, rasterMetaData)
.reproject(CRS.fromEpsgCode(2438), layoutScheme, Bilinear)
// Create the attributes store that will tell us information about our catalog.
val attributeStore = FileAttributeStore(outputPath)
// Create the writer that we will use to store the tiles in the local catalog.
val writer = FileLayerWriter(attributeStore)
// Pyramiding up the zoom levels, write our tiles out to the local file system.
Pyramid.upLevels(reprojected, layoutScheme, zoom, Bilinear) { (rdd, z) =>
val layerId = LayerId("test", z)
// If the layer exists already, delete it out before writing
if(attributeStore.layerExists(layerId)) {
new FileLayerManager(attributeStore).delete(layerId)
}
writer.write(layerId, rdd, ZCurveKeyIndexMethod)
}
it prints out
DEBUG org.apache.spark.memory.TaskMemoryManager - Task 105 acquired 7.0 MB for org.apache.spark.util.collection.ExternalSorter@10f6bf18
DEBUG org.apache.spark.memory.TaskMemoryManager - Task 105 acquired 18.8 MB for org.apache.spark.util.collection.ExternalSorter@10f6bf18
DEBUG org.apache.spark.memory.TaskMemoryManager - Task 105 acquired 31.7 MB for org.apache.spark.util.collection.ExternalSorter@10f6bf18
DEBUG org.apache.spark.memory.TaskMemoryManager - Task 105 acquired 88.3 MB for org.apache.spark.util.collection.ExternalSorter@10f6bf18
Hey @esmeetu are you sure that it is a loop? Can you print some details and throw here spark ui pictures (with tasks and executors)?
@pomadchin thanks for your quick reply!
Spark UI page Executors
tab is blank.
@esmeetu how do you submit it and what cluster do you have? Also can you show the picture of an app that hangs for an hour or so (just to see the state at what you think it hangs)
@pomadchin i use spark local mode.
val conf =
new SparkConf()
.setMaster("local[*]")
.setAppName("Spark Tiler")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "geotrellis.spark.io.kryo.KryoRegistrator")
val sc = new SparkContext(conf)
after one hour..
@esmeetu gotcha, so it does not hang - it is just slow. Also you don’t have any executors - it processes everything on a single executor - try to limit each executor with a single core - it will allow you to achieve some parallelism.
@pomadchin emm... i look into the reproject src code. it looks strange. the newKey is out of layout of 20 zoom level.
@esmeetu ha indeed; check out the extents / crs / etc. I would recommend you to create a unit test / run keysForGeometry function out of spark; it looks like it generates too many keys and they are probably different from what you expect. But it is again not hanging;
@pomadchin My input tif is about 100MB, i shouldn't be slow like this. how to write the unit test to test that function? i have no idea. In debug, it will generate 9064 spatialkeys, and these spatialkeys makes no sense i think.
@esmeetu wait; what projection do you use? @jsmoreau defines the world extent as Extent(-3753241.79, 403745.10, 333754.20, 4290855.71)
and yours is out of these bounds. Are you also in Canada Atlas Lambert
?
i use EPSG: 2438, and i calculate the worldExtent by
Extent(77.45, 37.0, 88.0, 41.99).reproject(CRS.fromEpsgCode(4214), crs)
crs in above code is CRS.fromEpsgCode(2438)
Problem seems resolved. After i changed the EARTH_CIRCUMFERENCE
value to my extent's length, it runs normally.
Another question. Why the variable EARTH_CIRCUMFERENCE
assigned Integer type?
@ecgreb gz that you figured it out.
Answering to your next question, EARTH_CIRCUMFERENCE
is of type Double
: https://github.com/locationtech/geotrellis/blob/master/layer/src/main/scala/geotrellis/layer/ZoomedLayoutScheme.scala#L26
Even though 2
here is integer, multiplication on Double
converts the entire value into Double
:
2 * math.Pi //> res0: Double = 6.283185307179586
Also the author of this post suggests to set it this way:
val EARTH_CIRCUMFERENCE = 4087995d
It also sets EARTH_CIRCUMFERENCE
to a Double
value (look at the small letter d
).
@pomadchin Doesn't it lose some precisions? Keep six or more decimals is better or not?
@esmeetu excuse me - I don’t quite follow you at this point; what are you talking about? Is your question why @jsmoreau used a rounded value? If so - it is better to ask him, I think this precision just worked for his goals , I don’t know how much the precision here affects the results (I can only guess that not much and it is not that important here since your goal is to generate some keys for the input).
@pomadchin yes, i think he might lose precision that using a rounded value when generating pyramid. Thanks a lot! :100:
@esmeetu and thank you as well for diving into it 👍
https://github.com/locationtech/geotrellis/blob/9825fd220f24d941e7edca8a9db05a4616a8024f/spark/src/main/scala/geotrellis/spark/tiling/ZoomedLayoutScheme.scala#L26
The worldExtent is being wrongly calculated when using epsg:3978. I think it has somethink to do with this class being called when ZoomedLayoutScheme is created:
I created a copy that works by hardcoding the extent and also adjusting the EARTH_CIRCUMFERENCE value to match the width of applicability of LCC.