Closed imperio-wxm closed 2 years ago
Hey @imperio-wxm, TLDR; two tiles on two different layouts, tiled differently are tried to be merged.
I didn't pay attention to that, but had these thoughts:
So they are located close to each other, and I guess the thing you're trying is to merge two 'layers' (RDDs that represent each scene) into a single one.
There are two problems with it in the code:
FloatingLayoutScheme
is used, which performs relative to the tile grid tiling:
KeyBounds(SpatialKey(0,0),SpatialKey(15,15))
KeyBounds(SpatialKey(0,0),SpatialKey(15,15))
It means that even though two scenes are positioned differently, the keys are the same.Extent(1.2319029048418062E7, 4742829.94013792, 1.264605352776789E7, 5069854.419487747), CellSize(39.91998038938323,39.91998038938323)
Extent(1.2271730055460626E7, 4540358.837452589, 1.2591363907295156E7, 4859992.689287118), CellSize(39.01780417901978,39.01780417901966)
merge
is done by keys, and assumes, that rdds are tiled according to the same layout, and keys represent the same tile on a layout.
java.lang.AssertionError: assertion failed: Row/col intervals must begin before they end
error is thrown) due to differences in it.To solve it we need to be sure that inputs are in the same projection and on the same layout:
it("merge rdd") {
val images = List(
"/Users/.../Downloads/LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF",
"/Users/.../Downloads/LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF"
)
val layoutScheme = FloatingLayoutScheme(512)
val source: RDD[RasterSource] = sc.parallelize(images).map { RasterSource(_).reproject(WebMercator): RasterSource }
// summary of all rasters
val summary = RasterSummary.fromRDD(source)
// layout includes all rasters
val LayoutLevel(_, layout) = summary.levelFor(layoutScheme)
// tiling
val rdd = RasterSourceRDD.tiledLayerRDD(source, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
// regrid
rdd.regrid(1024).toGeoTiffs().foreach {
case (key, mTile) =>
val localPath = s"/Users/.../Downloads/merge_$key.tif"
MultibandGeoTiff(mTile.tile, mTile.extent, mTile.crs).write(localPath, optimizedOrder = true)
}
}
If you want to keep the style of the code in the issue description (which is a bit inefficient):
it("merge rdd (behaves in fact like above, but manually and slower)") {
val image1 = "/Users/.../Downloads/LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF"
val image2 = "/Users/.../Downloads/LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF"
val layoutScheme = FloatingLayoutScheme(512)
// read raster1
val source1: RDD[RasterSource] = sparkContext.parallelize(image1 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource }
// read raster2
val source2: RDD[RasterSource] = sparkContext.parallelize(image2 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource }
// collect summaries
val summary1 = RasterSummary.fromRDD(source1)
val summary2 = RasterSummary.fromRDD(source2)
// combine summaries
val summary = summary1.combine(summary2)
// get the unified layout
val LayoutLevel(_, layout) = summary.levelFor(layoutScheme)
// tile rdds to the same layout
val rdd1 = RasterSourceRDD.tiledLayerRDD(source1, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
val rdd2 = RasterSourceRDD.tiledLayerRDD(source2, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
// merge
val rdd = rdd1.merge(rdd2)
// regrid
rdd.regrid(1024).toGeoTiffs().foreach {
case (key, mTile) =>
val localPath = s"/Users/.../Downloads/merge_$key.tif"
MultibandGeoTiff(mTile.tile, mTile.extent, mTile.crs).write(localPath, optimizedOrder = true)
}
}
@pomadchin hi, summary must be generated by RDD[RasterSource], If the rdd of the merge to be the intermediate calculation result instead of the original RasterSourceRdd, the following code: read the two images and do the division and addition respectively and then merge them.
How to generate summary through rdd1New、rdd2new and put them into the same layout in this case? What I want to do is actually, the merge of rdds generated by complex operations is no longer the original RasterDataSouce at this time.
it("merge rdd 2") {
def genSourceRddWrapper(sc: SparkContext, files: Seq[String], targetCRS: CRS, layoutScheme: LayoutScheme) = {
val sourceRDD: RDD[RasterSource] = sc.parallelize(files)
.map(uri => {
if (Objects.nonNull(targetCRS)) {
GeoTiffRasterSource(uri).reproject(targetCRS): RasterSource
} else {
GeoTiffRasterSource(uri): RasterSource
}
})
val summary = RasterSummary.fromRDD(sourceRDD)
val LayoutLevel(zoom, layout2) = summary.levelFor(layoutScheme)
RasterSourceRDD.tiledLayerRDD(sourceRDD, layout2, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
}
val image1 = "/Users/weiximing/Downloads/LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF"
val image2 = "/Users/weiximing/Downloads/LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF"
val rdd1 = genSourceRddWrapper(sparkContext, Seq(image1), WebMercator, FloatingLayoutScheme(512))
val rdd2 = genSourceRddWrapper(sparkContext, Seq(image2), WebMercator, FloatingLayoutScheme(512))
///// This may be a very complex calculation, or reprojection, resampling, crop extent may be performed /////
val rdd1New = rdd2.withContext {
rdd =>
val result = rdd.map { case (key, tile) => (key, MultibandTile(tile.bands.map(_.localDivide(100)))) }
result
}
val rdd2New = rdd2.withContext {
rdd =>
val result = rdd.map { case (key, tile) => (key, MultibandTile(tile.bands.map(_.localAdd(10)))) }
result
}
///////////////////////////
val resultRdd = rdd1New.merge(rdd2New)
resultRdd.regrid(1024).toGeoTiffs().foreach {
case (key, mTile) =>
val localPath = s"/Users/weiximing/code/temp_code/gcs-spark/src/test/scala/com/alibaba/aie/gcs/mean_$key.tif"
GeoTiffWriter.write(MultibandGeoTiff(mTile.tile, mTile.extent, mTile.crs), localPath, optimizedOrder = true)
}
}
hey @imperio-wxm read the example #2:
it("merge rdd (behaves in fact like above, but manually and slower)") {
val image1 = "/Users/.../Downloads/LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF"
val image2 = "/Users/.../Downloads/LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF"
val layoutScheme = FloatingLayoutScheme(512)
// read raster1
val source1: RDD[RasterSource] = sparkContext.parallelize(image1 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource }
// read raster2
val source2: RDD[RasterSource] = sparkContext.parallelize(image2 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource }
// collect summaries
val summary1 = RasterSummary.fromRDD(source1)
val summary2 = RasterSummary.fromRDD(source2)
// combine summaries
val summary = summary1.combine(summary2)
// get the unified layout
val LayoutLevel(_, layout) = summary.levelFor(layoutScheme)
// tile rdds to the same layout
val rdd1 = RasterSourceRDD.tiledLayerRDD(source1, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
val rdd2 = RasterSourceRDD.tiledLayerRDD(source2, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary))
// merge
val rdd = rdd1.merge(rdd2)
// regrid
rdd.regrid(1024).toGeoTiffs().foreach {
case (key, mTile) =>
val localPath = s"/Users/.../Downloads/merge_$key.tif"
MultibandGeoTiff(mTile.tile, mTile.extent, mTile.crs).write(localPath, optimizedOrder = true)
}
}
hey @imperio-wxm read the example #2:
it("merge rdd (behaves in fact like above, but manually and slower)") { val image1 = "/Users/.../Downloads/LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF" val image2 = "/Users/.../Downloads/LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF" val layoutScheme = FloatingLayoutScheme(512) // read raster1 val source1: RDD[RasterSource] = sparkContext.parallelize(image1 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource } // read raster2 val source2: RDD[RasterSource] = sparkContext.parallelize(image2 :: Nil).map { RasterSource(_).reproject(WebMercator): RasterSource } // collect summaries val summary1 = RasterSummary.fromRDD(source1) val summary2 = RasterSummary.fromRDD(source2) // combine summaries val summary = summary1.combine(summary2) // get the unified layout val LayoutLevel(_, layout) = summary.levelFor(layoutScheme) // tile rdds to the same layout val rdd1 = RasterSourceRDD.tiledLayerRDD(source1, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary)) val rdd2 = RasterSourceRDD.tiledLayerRDD(source2, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary)) // merge val rdd = rdd1.merge(rdd2) // regrid rdd.regrid(1024).toGeoTiffs().foreach { case (key, mTile) => val localPath = s"/Users/.../Downloads/merge_$key.tif" MultibandGeoTiff(mTile.tile, mTile.extent, mTile.crs).write(localPath, optimizedOrder = true) } }
I know, this case is very simple, what I want to ask is how to do summary combine through rdd which is not RasterSource. Because RasterSummary.fromRDD must be RasterSource, for example an RDD like this: RDD[(K, MultibandTile)] with Metadata[TileLayerMetadata[K]]
@imperio-wxm layouts should match upfront, if they are not global and / or do not match somehow else than there is nothing imbuilt you can do - keys are totally different, and you should recompute them somehow to make layers match.
If you have any cool ideas about how to handle / suggest some API around it - that is very welcome btw; but it doesn't sound like a very trivial or a cheap task.
The alternative which may help is to use the global ZommedLayoutScheme
for all layers, and in this case you worry only about the resolution, layers zoom levels should match.
@pomadchin Hi, if I want to recalculate the keys and re-layout, is there any API or code I can refer to?
hey @imperio-wxm, I am afraid no. I think you may want smth like tileToLayout
but for any TileLayerRDD
which is not implemented.
I'm closing this for now, feel free to reopen it / create a new issue!
Describe the bug
Regrid operation after merging two
MultibandTileLayerRDD[K]
, get Row/Col error messageTo Reproduce
it("merge rdd") { def genSourceRddWrapper(sc: SparkContext, files: Seq[String], targetCRS: CRS, layoutScheme: LayoutScheme) = { val sourceRDD: RDD[RasterSource] = sc.parallelize(files) .map(uri => { if (Objects.nonNull(targetCRS)) { GeoTiffRasterSource(uri).reproject(targetCRS): RasterSource } else { GeoTiffRasterSource(uri): RasterSource } }) val summary = RasterSummary.fromRDD(sourceRDD) val LayoutLevel(zoom, layout2) = summary.levelFor(layoutScheme) RasterSourceRDD.tiledLayerRDD(sourceRDD, layout2, KeyExtractor.spatialKeyExtractor, rasterSummary = Some(summary)) }
}
Screenshots
LC08_L2SP_126032_20220408_20220412_02_T1_SR_B2.TIF Metadata
LC08_L2SP_126033_20220408_20220412_02_T1_SR_B2.TIF Metadata
After the merge, the values of cols and rows of the layout are still the previous values, and have not changed, which is very strange.
Environment