pangeo-data / pangeo-rema

Project with Jonny analyzing REMA data
Apache License 2.0
0 stars 1 forks source link

convert REMA data to cloud optimized geotiff #1

Open rabernat opened 5 years ago

rabernat commented 5 years ago

Cloud optimized geotiff is an image format optimized for high-throughput reading via the web: http://www.cogeo.org/

The REMA data is not in COG format. A first step towards making it work efficiently in pangeo is to convert it.

I played around with this a bit, starting by downloading this sample data

wget http://data.pgc.umn.edu/elev/dem/setsm/REMA/mosaic/v1.1/8m/09_41/09_41_8m.tar.gz
tar -xvzf 09_41_8m.tar.gz

To create a cogs and stuff, you need to install the rasterio cog extension. One you have it, you can examine the file metadata.

$ rio info 09_41_8m_dem.tif
{"bounds": [1000000.0, -2200000.0, 1100000.0, -2100000.0], "colorinterp": ["grey"], "compress": "lzw", "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band", "lnglat": [153.9704078084865, -68.23346907615766], "mask_flags": [["nodata"]], "nodata": -9999.0, "res": [8.0, 8.0], "shape": [12500, 12500], "tiled": false, "transform": [8.0, 0.0, 1000000.0, 0.0, -8.0, -2100000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500} 

This is not a cog. We can make one like this

$ rio cogeo 09_41_8m_dem.tif 09_41_8m_dem_COG_RAW.tif --cog-profile raw
$ rio info 09_41_8m_dem_COG_RAW.tif
{"blockxsize": 512, "blockysize": 512, "bounds": [1000000.0, -2200000.0, 1100000.0, -2100000.0], "colorinterp": ["grey"], "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band","lnglat": [153.9704078084865, -68.23346907615766], "mask_flags": [["per_dataset"]], "nodata": null, "res": [8.0, 8.0], "shape": [12500,12500], "tiled": true, "transform": [8.0, 0.0, 1000000.0, 0.0, -8.0, -2100000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500}

This is tiled (see blocksize) but with no compression. It's better to use compression:

$ rio cogeo 09_41_8m_dem.tif 09_41_8m_dem_COG_LZW.tif --cog-profile lzw
$ rio info 09_41_8m_dem_COG_LZW.tif
{"blockxsize": 512, "blockysize": 512, "bounds": [1000000.0, -2200000.0, 1100000.0, -2100000.0], "colorinterp": ["grey"], "compress": "lzw", "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band", "lnglat": [153.9704078084865, -68.23346907615766], "mask_flags": [["per_dataset"]], "nodata": null, "res": [8.0, 8.0], "shape": [12500, 12500], "tiled": true, "transform": [8.0, 0.0, 1000000.0, 0.0, -8.0, -2100000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500}

Let's compare the sizes of the files

$ ls -lh 09_41_8m_dem*.tif
-rw-rw-r-- 1 jovyan users 290M Feb 13 03:20 09_41_8m_dem_COG_LZW.tif
-rw-rw-r-- 1 jovyan users 865M Feb 21 21:38 09_41_8m_dem_COG_RAW.tif
-rw-rw---- 1 jovyan users 228M Oct 26 15:09 09_41_8m_dem.tif

We see that the LSW compressed COG has the same size as the original file, but it is MUCH more useful!

jjspergel commented 5 years ago

Is the code missing the command "create"?

rabernat commented 5 years ago

Hi julian? You create a cog as follows:

$ rio cogeo 09_41_8m_dem.tif 09_41_8m_dem_COG_LZW.tif --cog-profile lzw
jjspergel commented 5 years ago

Hi Ryan, Is it possible that this could be a Mac/PC issue? My command prompt doesn't recognize what you've written as is as a valid command because the filename is not considered a command. I'm able to create a cog with: rio cogeo create 38_49_8m_dem.tif 38_49_8m_dem_COG_LZW.tif --cog-profile lzw Julian

rabernat commented 5 years ago

Did you install the cog extension?

https://github.com/cogeotiff/rio-cogeo

jjspergel commented 5 years ago

Yup! I installed it with pip as they said on their website. It seems to be working correctly.

On Wed, Mar 20, 2019 at 10:34 AM Ryan Abernathey notifications@github.com wrote:

Did you install the cog extension?

https://github.com/cogeotiff/rio-cogeo

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/1#issuecomment-474857867, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-5EWRhtfzcsC3T-o-hKRdjDi64vOdWks5vYkb1gaJpZM4b8NKC .

-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences

rabernat commented 5 years ago

Ok I think I see what is going on. The syntax has actually changed since I first installed and ran these commands:

https://github.com/cogeotiff/rio-cogeo/blob/master/CHANGES.txt

The following changes were introduced in the version released on March 15

1.0b0 (2019-03-15)
------------------
- add more logging and `--quiet` option (#46)
- add `--overview-blocksize` to set overview's internal tile size (#60)

Bug fixes:

- copy tags and description from input to output (#19)
- copy input mask band to output mask

Breacking Changes:

- rio cogeo now has subcommands: 'create' and 'validate' (#6).
- internal mask creation is now optional (--add-mask).
- internal nodata or alpha channel can be forwarded to the output dataset.
- removed default overview blocksize to be equal to the raw data blocksize (#60)
jjspergel commented 5 years ago

OK! Thank you for checking in on that. I'm reading through these pages before starting on implementing it. Thank you again for your help

On Wed, Mar 20, 2019 at 10:42 AM Ryan Abernathey notifications@github.com wrote:

Ok I think I see what is going on. The syntax has actually changed since I first installed and ran these commands:

https://github.com/cogeotiff/rio-cogeo/blob/master/CHANGES.txt

The following changes were introduced in the version released on March 15

1.0b0 (2019-03-15)

  • add more logging and --quiet option (#46)
  • add --overview-blocksize to set overview's internal tile size (#60)

Bug fixes:

  • copy tags and description from input to output (#19)
  • copy input mask band to output mask

Breacking Changes:

  • rio cogeo now has subcommands: 'create' and 'validate' (#6).
  • internal mask creation is now optional (--add-mask).
  • internal nodata or alpha channel can be forwarded to the output dataset.
  • removed default overview blocksize to be equal to the raw data blocksize (#60)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/1#issuecomment-474861676, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-5EdyxC-NTVL10atM1Rh8k_8J18Cmaks5vYkjagaJpZM4b8NKC .

-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences

jkingslake commented 5 years ago

Hi, I got the geo-scipy environment installed thanks to your online course materials Ryan. I am now running the code that Julian sent me: forfiles /s /m *_dem.tif /c "cmd /c rio cogeo create @file @fname_COG_LZW.tif --cog-profile lzw and it seems to be working very nicely and producing COGs for each DEM tif. I should also remember to do this for all the DAY tiffs (the ones showing which day the each pixel corresponds to). (i know i could have done this in one step, but I didnt think about it and now its running!) Jonny

jkingslake commented 5 years ago

One question: I am getting the following warning. Does this matter do you think? WARNING:rasterio._env:CPLE_NotSupported in driver GTiff does not support creation option BIGTIFF

The COGs open in arcmap OK and everything looks good.

rabernat commented 5 years ago

To double check that the conversion is working, try running:

rio info <filename>

and examine the output

jkingslake commented 5 years ago

Looks good to me: J:\REMA_Mosaic_8m_v1.1\8m\09_38>rio info 09_38_8m_dem_COG_LZW.tif {"blockxsize": 512, "blockysize": 512, "bounds": [700000.0, -2200000.0, 800000.0, -2100000.0], "colorinterp": ["gray"], "compress": "lzw", "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band", "lnglat": [160.76932762433867, -69.26294588007902], "mask_flags": [["nodata"]], "nodata": -9999.0, "res": [8.0, 8.0], "shape": [12500, 12500], "tiled": true, "transform": [8.0, 0.0, 700000.0, 0.0, -8.0, -2100000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500}

Strange thing is that when I get info on many of the original files they also apparently have a block size, e.g.: (geo_scipy) J:\12_49_8m>rio info 12_49_8m_dem.tif {"blockxsize": 256, "blockysize": 256, "bounds": [1800000.0, -1900000.0, 1900000.0, -1800000.0], "colorinterp": ["gray"], "compress": "lzw", "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band", "lnglat": [134.99999999999997, -66.2526713074506], "mask_flags": [["nodata"]], "nodata": -9999.0, "res": [8.0, 8.0], "shape": [12500, 12500], "tiled": true, "transform": [8.0, 0.0, 1800000.0, 0.0, -8.0, -1800000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500}

I don't see this in the one you originally looked at though: (geo_scipy) J:\REMA_Mosaic_8m_v1.1\8m\09_41>rio info 09_41_8m_dem.tif {"bounds": [1000000.0, -2200000.0, 1100000.0, -2100000.0], "colorinterp": ["gray"], "compress": "lzw", "count": 1, "crs": "EPSG:3031", "descriptions": [null], "driver": "GTiff", "dtype": "float32", "height": 12500, "indexes": [1], "interleave": "band", "lnglat": [153.9704078084865, -68.23346907615766], "mask_flags": [["nodata"]], "nodata": -9999.0, "res": [8.0, 8.0], "shape": [12500, 12500], "tiled": false, "transform": [8.0, 0.0, 1000000.0, 0.0, -8.0, -2100000.0, 0.0, 0.0, 1.0], "units": [null], "width": 12500}

rabernat commented 5 years ago

ok that's an important catch! Some of these may already be COGs!

Reading the cogeo docs (https://github.com/cogeotiff/rio-cogeo) it looks like there is also a validate command

rio cogeo validate <filename>

That is probably better than rio info.

jkingslake commented 5 years ago

OK, yes validate is the better one to use. Some files don't return a blocksize when using "rio info"--> not COGs Some files DO return a blocksize when using "rio info", but isnt recognized as a COG by 'validate" Some files DO return a blocksize when using "rio info", ARE recognized as a COG by 'validate", but don't have internal overviews, which is recommended apparently.

So overall, going through them all and running this conversion seems to be the safest things to do.

Thanks for your help!