holoviz-topics / EarthML

Tools for working with machine learning in earth science
https://earthml.holoviz.org
BSD 3-Clause "New" or "Revised" License
94 stars 21 forks source link

Finding examples of openly accessible images #3

Open jlstevens opened 6 years ago

jlstevens commented 6 years ago

The original version of landsat_spectral_clustering.ipynb used redding.tif which was obtained originally from planet.com. As I wasn't sure whether this image could be made available, I updated the notebook to use a landsat example taken from a datashader example.

@ebo then informed me by e-mail that this image is not really suitable as it only has two bands. We would like some example images that can be made public that also have a decent number of bands. This is important as we will then be able to compute the various indices that we might then want to learn on.

One suggestion by @ebo was to use these images of a disappearing lake although I only see a link to download them in JPG format?

Lastly, a new notebook has been committed referencing 'Midwest_Mosaic.tif' which I don't think we have discussed yet. Is this something we could slice down and add to the repo as an example?

jbednar commented 6 years ago

I'm not sure what you mean about this image having only two bands; it has 11 bands if you count panchromatic:

http://datashader.org/topics/landsat.html

jbednar commented 6 years ago

Oh, I think I see the confusion: The information from the different bands is in different files. You need all the files, if you want all the bands. So I don't think we actually need a new example here, you just have to read all the different bands in, from different files.

jlstevens commented 6 years ago

As the current example does have sufficient bands (across the relevant files) I've renamed this issue to reflect that we would still like a variety of good, publicly accessible image examples regardless.

ebo commented 6 years ago

The jpeg formatted images are not suitable to calculate things like NDVI (vegetation). There are a couple of sites to download the images from, like EarthExployer and Glovis, but the how and why of that I hope is beyond the scope of the examples other than providing a suitable download location. I went on to downloaded the same two LANDSAT images from NASA's EarthExployer and have started a notebook with the intent to go through the entire exercise step-by-step (which include things like stacking the individual bands, generating masks from NoData and non-overlapping images, calculating NDVI, visualizing, etc.). The images are 7961x7241 and 7821x7941 for the 1988/10/22 and 2017/10/22 images respectively. These images also do not exactly overlap, so they have to be clipped, etc., if you are going to use the conventional "open as an xarray" approach. This is a very common workflow for people doing geospatial analysis, and it is just what it is. While we could slice them down as Jean-Luc suggests, it sidesteps a lot of issues that users have to deal with and hides them. THis has contributed to my own confusion and difficulty learning your tools -- because there is enough differences between xarray, dask, rasterio, and numpy that non of my intuition on how they work together actually holds.

One other problem I stumbled on to is the ordering of the data -- is this (bands, height, width) or (height, width, bands) or (height, bands, width)... I think it is rasterio that has a helper function which reorders the bands from the stacked 2D images to 3D image ordering.
Thinking about this a little I realized that I rarely have control over how the data come to me, and this would suggest that a general reordering/mapping method may be in order.

On Apr 25 2018 8:27 AM, Jean-Luc Stevens wrote:

The original version of

landsat_spectral_clustering.ipynb used redding.tif which was obtained originally from

planet.com. As I wasn't sure whether this image could be made available, I updated the notebook to use a landsat example taken from a datashader example.

@ebo then informed me by e-mail that this image is not really suitable as it only has two bands. We would like some example images that can be made public that also have a decent number of bands. This is important as we will then be able to compute the various indices that we might then want to learn on.

One suggestion by @ebo was to use these images of a disappearing lake although I only see a link to download them in JPG format?

Lastly, a [new

notebook](https://github.com/pyviz-topics/EarthML/commit/1545c61b42518fc37ff1284cc2ef8532d1405b49) has been committed referencing 'Midwest_Mosaic.tif' which I don't think we have discussed yet. Is this something we could slice down and add to the repo as an example?

ebo commented 6 years ago

On Apr 25 2018 8:33 AM, James A. Bednar wrote:

I'm not sure what you mean about this image having only two bands; it has 11 bands if you count panchromatic:

http://datashader.org/topics/landsat.html

that one might, but the image https://github.com/pyviz-topics/EarthML/blob/master/examples/landsat-sample.tiff added to pyviz-topics/EarthML repository only has two:

=================

gdalinfo ~/Downloads/landsat-sample.tiff

Driver: GTiff/GeoTIFF Files: /home/jldavid3/Downloads/landsat-sample.tiff Size is 2500, 2500 Coordinate System is `' Metadata:

TIFFTAG_DOCUMENTNAME=/Users/jstevens/Desktop/development/EarthML/examples/cropped.tiff TIFFTAG_IMAGEDESCRIPTION=Created with GIMP TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch) TIFFTAG_XRESOLUTION=72 TIFFTAG_YRESOLUTION=72 Image Structure Metadata: COMPRESSION=LZW INTERLEAVE=PIXEL Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 2500.0) Upper Right ( 2500.0, 0.0) Lower Right ( 2500.0, 2500.0) Center ( 1250.0, 1250.0) Band 1 Block=2500x64 Type=Byte, ColorInterp=Gray Mask Flags: PER_DATASET ALPHA Band 2 Block=2500x64 Type=Byte, ColorInterp=Alpha

jlstevens commented 6 years ago

TIFFTAG_IMAGEDESCRIPTION=Created with GIMP

I wonder if that is what is responsible. It seemed to be the quickest way to slice the image but I guess it might not have preserved the bands.

jbednar commented 6 years ago

Right. @jlstevens, please rename the example file to indicate which original image you started with; the filename indicates which bands it was. And then we could consider stacking the various images into a single merged image, but the separate images are how the files were provided from LANDSAT.

ebo commented 6 years ago

On Apr 25 2018 8:40 AM, James A. Bednar wrote:

Oh, I think I see the confusion: The information from the different bands is in different files. You need all the files, if you want all the bands. So I don't think we actually need a new example here, you just have to read all the different bands in, from different files.

Why I was advocating a different image/example is that if you are doing a change detection you have to compare two images which are not guaranteed to exactly overlap, have different spectral characteristics (LANDSAT-5 vs 8), and other such geospatial issues. Whatever we choose, if we pick an example and use it again and again, then we can leverage it for a half-dozen lessons. The 90% loss of Walker Lake water volume is not only timely, but people people care about the background story.
If it is used for a tutorial on end-to-end how you do this stuff, then we can demonstrate what actually happens when you work with these images for real.

jlstevens commented 6 years ago

The 90% loss of Walker Lake water volume is not only timely, but people people care about the background story. If it is used for a tutorial on end-to-end how you do this stuff, then we can demonstrate what actually happens when you work with these images for real.

I agree that if we can tell a compelling story with the data we have then we should.

jbednar commented 6 years ago

Sounds good. It will be great to have additional examples showing other topics, and it will be great to keep the overall number of data files that people have to download and that we have to document low. But it's not the number of bands that would invalidate this example (as in Jean-Luc's original title), just that some other example may subsume it for other reasons. When that happens, great!

ebo commented 6 years ago

On Apr 25 2018 9:18 AM, Jean-Luc Stevens wrote:

TIFFTAG_IMAGEDESCRIPTION=Created with GIMP

I wonder if that is what is responsible. It seemed to be the quickest way to slice the image but I guess it might not have preserved the bands.

While that is the offending tool, the real problem is that it was processed in a way that you lost most or all projection information, and covnerted a multispectral into a gray-scale (panchromatic) image.

jbednar commented 6 years ago

There are some high-resolution CC BY-SA-licensed examples at https://info.planet.com/download-free-high-resolution-skysat-image-samples/, though none of them cover the same region of the earth at different times that I can see.

jbednar commented 6 years ago

In case it helps, the original files from the current example are available from: https://github.com/bokeh/datashader/blob/master/examples/datasets.yml#L36 and shouldn't have any of the problems from the Gimp-processed version. But if there are better examples that can be used to tell more stories, then bring them on! :-)

ebo commented 6 years ago

On Apr 25 2018 9:22 AM, Jean-Luc Stevens wrote:

I agree that if we can tell a compelling story with the data we have then we should.

I'm working on that... Not only was the study highlighted by NASA's Earth Observatory https://earthobservatory.nasa.gov/IOTD/view.php?id=91921, the study was also published in Nature Geosciences https://www.nature.com/articles/ngeo3052. That is a huge deal.

What I am proposing is to replace the original example image with these two, unless there is a compelling reason or similar compelling story about that image. There may be and I had just missed it...

Oh, another interesting thing is that we might be able to get access to the salinity data to replicate the graphs https://www.nature.com/articles/ngeo3052/figures/2 in the Nature paper.

jbednar commented 6 years ago

Sounds good! No, there wasn't a compelling story about the original image, though there was intended to be. :-) It was originally chosen to try to show the differences between the actual coastline of Southern Louisiana around New Orleans and what standard maps show as the outline of Louisiana, but the wrong image was selected, and so it ended up being the wrong bit of coastline, not telling that story at all. So there's no problem replacing it with something that has a better story.

mrocklin commented 6 years ago

One other problem I stumbled on to is the ordering of the data -- is this (bands, height, width) or (height, width, bands) or (height, bands, width)... I think it is rasterio that has a helper function which reorders the bands from the stacked 2D images to 3D image ordering. Thinking about this a little I realized that I rarely have control over how the data come to me, and this would suggest that a general reordering/mapping method may be in order.

Can I ask you to expand on this a bit further? I suspect that Numpy and XArray both have mechanisms to help you here. In particular you might want to look into parts of their respective APIs that include stack and transpose functions depending on what you want.

ebo commented 6 years ago

On Apr 25 2018 9:35 AM, James A. Bednar wrote:

Sounds good! No, there wasn't a compelling story about the original image, though there was intended to be. :-) It was originally chosen to try to show the differences between the actual coastline of Southern Louisiana around New Orleans and what standard maps show as the outline of Louisiana, but the wrong image was selected, and so it ended up being the wrong bit of coastline, not telling that story at all. So there's no problem replacing it with something that has a better story.

I would have to check on some things, but my wife is a senior research ecologist with the USGS, and has permanently monitored sites in the cypress swamps. The water extraction in Texas is causing the cypress swamps to die in the area around Beaumont TX. If we had an example that focused on that area I may be able to get you some really compelling stories (I hesitate to call them good stories). I would have to check if that would cause a conflict of interest, etc. She also has sites in Jean Lafitte National Park, which is around where you were talking about, and we can engage quite a number of folks there as well.

ebo commented 6 years ago

On Apr 25 2018 9:49 AM, Matthew Rocklin wrote:

One other problem I stumbled on to is the ordering of the data -- is this (bands, height, width) or (height, width, bands) or (height, bands, width)... I think it is rasterio that has a helper function which reorders the bands from the stacked 2D images to 3D image ordering. Thinking about this a little I realized that I rarely have control over how the data come to me, and this would suggest that a general reordering/mapping method may be in order.

Can I ask you to expand on this a bit further? I suspect that Numpy and XArray both have mechanisms to help you here. In particular you might want to look into parts of their respective APIs that include stack and transpose functions depending on what you want.

In the last example I sent to Jean-Luc I used transpose and stack. I am not sure I did it right, but yes I am marginally aware of the base functionality.

What I had meant by the mapping function is, it is intuitive what (band, y, x) means, as well as (x, y, z) in 3D data. Either documenting a couple of examples using stack and transpose to show how it is done, or developing a helper function that does an intuitive mapping, would be useful I think. There is a precedent with rasterio's reshape_as_image which will take an image stack (3, 718, 791) and remap it to (718, 791, 3) or in my example above (orient={'bands':'z'}) or some such. (see https://github.com/mapbox/rasterio/blob/master/docs/topics/image_processing.rst)

ebo commented 5 years ago

BTW, the author of the Nature Geosciences article sent me the data to replicate several of the published graphs. I have a number of other things on my plate at the moment, but if we extend the walker lake to either include the Great Salt Lake, or as a second notebook. We should be able to replicate the work in https://www.fs.fed.us/rm/pubs_journals/2017/rmrs_2017_wurtsbaugh_w001.pdf. Also, I have permission to publicly release the data with the agreement to properly cite and credit the work. I would also want to have them review this before formally releasing it if possible to make sure I/we do not make a mistake that would offend.

jbednar commented 4 years ago

A recently announced source of freely available hi-res imagery:

https://twitter.com/planetlabs/status/1308768058098450437