locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

RasterSource fails if selected bands are of two different resolutions. #174

Open metasim opened 5 years ago

metasim commented 5 years ago

Test:

val catalog = spark.read.l8Catalog.load()
val df = spark.read.raster
  .fromCatalog(catalog, "B1", "B8")
  .load()

Error:

19/07/01 09:28:00 ERROR RasterSourceToRasterRefs: Error fetching data for one of: GDALRasterSource(https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/003/048/LC08_L1TP_003048_20180805_20180814_01_T1/LC08_L1TP_003048_20180805_20180814_01_T1_B1.TIF), GDALRasterSource(https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/003/048/LC08_L1TP_003048_20180805_20180814_01_T1/LC08_L1TP_003048_20180805_20180814_01_T1_B8.TIF)
java.lang.IllegalArgumentException: transpose requires all collections have the same size
    at scala.collection.generic.GenericTraversableTemplate$class.fail$1(GenericTraversableTemplate.scala:213)
    at scala.collection.generic.GenericTraversableTemplate$$anonfun$transpose$1$$anonfun$apply$1.apply(GenericTraversableTemplate.scala:220)
    at scala.collection.generic.GenericTraversableTemplate$$anonfun$transpose$1$$anonfun$apply$1.apply(GenericTraversableTemplate.scala:219)
    at scala.collection.immutable.Stream.foreach(Stream.scala:594)
    at scala.collection.generic.GenericTraversableTemplate$$anonfun$transpose$1.apply(GenericTraversableTemplate.scala:219)
    at scala.collection.generic.GenericTraversableTemplate$$anonfun$transpose$1.apply(GenericTraversableTemplate.scala:217)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.generic.GenericTraversableTemplate$class.transpose(GenericTraversableTemplate.scala:217)
    at scala.collection.AbstractTraversable.transpose(Traversable.scala:104)
    at org.locationtech.rasterframes.expressions.transformers.RasterSourceToRasterRefs.eval(RasterSourceToRasterRefs.scala:77)
metasim commented 5 years ago

@vpipkt Need to figure out what's the appropriate way of handling this B1 is 30m, B8 is 15m.

vpipkt commented 4 years ago

Excellent but yikes. I think desired result here may vary by use case. Need to think that through...

vpipkt commented 4 years ago

Hmm i just duplicated this #326

But I also thought about some solutions.

vpipkt commented 4 years ago

From #326 dupe, some thoughts on how to address this.

A current work around would be to make individual catalogs for each resolution, making careful use of the tileDimensions param, then try to join on extents.

A proposed solution:

In the RasterSource reader, use the raster with the finest resolution to set the gridding and extents to read for all the other columns. This will result in the same number of rows generated from each catalog row.

A further nice to have may be a way for the user to express if / how they want resampling so all tile columns are at the same resolution. A sensible enough default might be for nearest neighbor upsampling to the finest resolution column.

vpipkt commented 4 years ago

A further point related to this.

If there is a value in one of the catalogColumns that is null the same error occurs, instead of just returning a null value for the projected raster... ?

metasim commented 4 years ago

In the RasterSource reader, use the raster with the finest resolution to set the gridding and extents to read for all the other columns. This will result in the same number of rows generated from each catalog row.

đź‘Ť

vpipkt commented 3 years ago

@JenniferYingyiWu2020 i believe your comment is not relevant to this particular issue. Please take a look at this response