locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
248 stars 45 forks source link

How to generate ".tiff" image of unsupervised machine learning? #511

Open JenniferYingyiWu2020 opened 4 years ago

JenniferYingyiWu2020 commented 4 years ago

Dear All, I have written a scala function to read "geotrellis catalog" data using rasterframe firstly, then do unsupervised machine learning with geotrellis. After that, the result of classification need to be output, but the generated .png image is not ideal. Due to get the "unsupervised machine learning.py" from "https://rasterframes.io/unsupervised-learning.html" website is so easy, but do you have the .scala file to implement the same function? Could you give me some suggestions? Thanks! 11 12

vpipkt commented 4 years ago

It looks like you are on the right path. I think that you will have to define a path for each element in the geoTiffRDD and then perhaps something like:

geoTiffRDD.map{ gt => 

  fname = ???
  gt.write(fname)

Depending on your use case you could just have arbitrary unique Id's like UUID's for the filename. Or you could try to exploit the SingleBandGeoTiff.extent to create filenames?

This may be a good question for the GeoTrellis gitter chat as well.

metasim commented 4 years ago

@JenniferYingyiWu2020 So so close! If you already have a raster, you can just use GeoTrellis functions to write it out as a GeoTIFF.

Perhaps even more convenient if you don't already have a "RasterLayer", or you can instead use the rf.write.geotiff.save(...) function (you have to import the ...datasource.geotiff._ package.

example: https://github.com/locationtech/rasterframes/blob/8bbed00ab7970a4003533cadc99173e2a5ab7838/datasource/src/test/scala/org/locationtech/rasterframes/datasource/geotiff/GeoTiffDataSourceSpec.scala#L281-L284

the avalable withXXX convenience methods are defined here:

https://github.com/locationtech/rasterframes/blob/8bbed00ab7970a4003533cadc99173e2a5ab7838/datasource/src/main/scala/org/locationtech/rasterframes/datasource/geotiff/package.scala#L57-L75

vpipkt commented 4 years ago

@JenniferYingyiWu2020 let us know if this resolved your question.

JenniferYingyiWu2020 commented 4 years ago

Hi, I read "geotrellis catalog" from RasterFrame using scala language, but the generated ".tiff" image has nothing. Firstly, "multiband-geotiff" .tiff image was read, and "Geotrellis Catalog" data has been generated under "/home/jenniferwu/Documents/Java_projects/My-Draft-Projects/SBT-Projects/tmp/catalog/". However, when I run "unsupervised machine learning" which has been written using scala, the generated output ".tiff" image is abnormal! So, could you pls give me some suggestions? Thanks!

Step 1 - object ImageIngest: 2

Run ImageIngest.scala, using params: " --input input.json --output output.json --backend-profiles backend-profiles.json "

input.json: 3

output.json: 4

backend-profiles.json: 1111

5 1

5

Step 2 - def test_read_Geotrellis_catalog: 11

12

14

1

Moreover, the abnormal result of generation ("test_geotrellis_catalog.tiff") is: output

Finally, the log file of running "def test_read_Geotrellis_catalog": test_read_Geotrellis_catalog_log.txt

metasim commented 4 years ago

@JenniferYingyiWu2020 I'd try the rf.toLayer(...).toMultibandRaster(...) and programmatically inspect the result to see if it's all NoData, or something else. Also, what are you viewing the image in? Not all image viewers understand GeoTIFFs with "exotic" cell types.

vpipkt commented 4 years ago

@JenniferYingyiWu2020 the code seems syntactically correct etc.

Can you give more detail about what the you expect the result of the "test_geotrellis_catalog.tiff" to be?

One thing to consider is that you may not want to use the write.geotiff to write out such a result (cluster membership) because internally this will use bilinear interpolation on your values. Relevant code is here and here for reference...

It is unclear the size of the resulting raster. Your joinedRF.show only has 9 rows at 256x256 tiles, which should be reasonable to write out at full resolution in the native CRS. However, I am not sure why the tlm.layoutCols and tlm.layoutRows are 8192. What do you see for rf.tileLayerMetadata.left.get.layoutCols and layoutRows?

Here is a guess of something to try. This will basically write in native CRS at full resolution. I expect this is okay if the rfTlm.totalRows and totalCols are on the order of 9 * 256 = 2304:

val rfTlm = rf.tileLayerMetadata.left.get
rf.write.geotiff
  .withCRS(rfTlm.crs)
  .withDimensions(rfTlm.totalCols, rfTlm.totalRows)
  .save(path + filename)
JenniferYingyiWu2020 commented 4 years ago

Hi, The image I used in "input.json" file is "file1.tif": file1.zip Firstly, "ImageIngest.scala was run, using following params. (The file screenshot has been uploaded above.) " --input input.json --output output.json --backend-profiles backend-profiles.json " After that, "catalog/GeotrellisCatalogOriginal" folder will be generated successfully. So, the function "def test_read_Geotrellis_catalog" will read the image data from the directory "catalog/GeotrellisCatalogOriginal". Besides, I assigned "val zoom = 13" in the function. In fact, the images under the "catalog/GeotrellisCatalogOriginal/13" folder will be used in that function ("def test_read_Geotrellis_catalog"). By the way, the screenshot of "catalog/GeotrellisCatalogOriginal/13" folder has bee uploaded above. To sum, when the function "def test_read_Geotrellis_catalog" was been executed using the above image dataset, the above log file has been output, also the abnormal "test_geotrellis_catalog.tiff" image has been generated.

JenniferYingyiWu2020 commented 4 years ago

Hi, If the image file "file1.tif" is read, using "spark.read.geotiff.load(geoTiffPath).asLayer", then the result generated image is normal. val geoTiffPath = dataRootPath + "file1.tif" val joinedRF = spark.read.geotiff.load(geoTiffPath).asLayer Furthermore, the output .png image of the function "def test_read_MultibandGeoTiff" is below, which is our expected result image. 1 Finally, the related log file is: tif_log.txt However, our task is to read "read Geotrellis catalog" (as I have talked from the beginning), the issues always cannot be resolved, could you pls help to give me some suggestions? Thanks! (Note: In short, if the scala program using RasterFrames to read a Multiband GeoTiff image, then the generated result is normal. Else, if it is to read Geotrellis catalog, then the output is abnormal.)

JenniferYingyiWu2020 commented 4 years ago

Hi, I have printed the value of "rf.tileLayerMetadata.left.get.layoutCols" and "rf.tileLayerMetadata.left.get.layoutRows" in the function "def test_read_Geotrellis_catalog": " rf.tileLayerMetadata.left.get.layoutCols: 8192 rf.tileLayerMetadata.left.get.layoutRows: 8192 " Also, I have taken your suggestions seriously: " val rfTlm = rf.tileLayerMetadata.left.get

rf.write.geotiff .withCRS(rfTlm.crs) .withDimensions(rfTlm.tileCols, rfTlm.tileRows) .save(rootPath + "output/test_geotrellis_catalog.tiff") " However, the output was still abnormal: 2

vpipkt commented 4 years ago

@metasim do you think there could be some unexpected behavior in the geotrellis reader? Or maybe something to do with cell types here?