tlnagy / TiffImages.jl

💎 Pure-Julia TIFF I/O with a focus on correctness 🧐
http://tamasnagy.com/TiffImages.jl/
MIT License
51 stars 13 forks source link

Add support for GeoTIFFs #100

Open Alexander-Barth opened 1 year ago

Alexander-Barth commented 1 year ago

Thank you very much for this great package. I don't know much about the Tiff format, but I am sure that it represents a lot of work to implement such a complex format :-)

My use case is to read satellite data in the GeoTIFF format. The data I am working with has 11 (spectral) bands. I am currently using ArchGDAL to read this data. It seem that TiffImages.jl currently reads only the first band:

using TiffImages
tiff = TiffImages.load(download("https://data-assimilation.net/upload/Alex/TIFF/S2_1-12-19_48MYU_0.tif"));
eltype(tiff)
# output: ColorTypes.Gray{Float32}
size(tiff)
# output (256, 256)

Using the command line tool gdalinfo S2_1-12-19_48MYU_0.tif, I get these 11 bands

[...]
Band 1 Block=256x1 Type=Float32, ColorInterp=Gray
Band 2 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 3 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 4 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 5 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 6 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 7 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 8 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 9 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 10 Block=256x1 Type=Float32, ColorInterp=Undefined
Band 11 Block=256x1 Type=Float32, ColorInterp=Undefined

Is there a way to load also the other bands using TiffImages.jl ? Is this with-in scope of the package?

Ref: https://github.com/Alexander-Barth/TIFFDatasets.jl/issues/1 I am using TiffImages v0.6.4, julia 1.8.3 on 64-bit Linux

tlnagy commented 1 year ago

Thanks for the kind words and for the MWE!

julia> tiff = TiffImages.load(download("https://data-assimilation.net/upload/Alex/TIFF/S2_1-12-19_48MYU_0.tif"));

julia> size(tiff)
(256, 256)

julia> tiff.ifds
1-element Vector{TiffImages.IFD{UInt32}}:
 IFD, with tags: 
    Tag(IMAGEWIDTH, 256)
    Tag(IMAGELENGTH, 256)
    Tag(BITSPERSAMPLE, 32)
    Tag(COMPRESSION, COMPRESSION_NONE)
    Tag(PHOTOMETRIC, 1)
    Tag(STRIPOFFSETS, UInt32[1972, 13236, 24500, 35764, 47028, ...])
    Tag(SAMPLESPERPIXEL, 1)
    Tag(STRIPBYTECOUNTS, UInt16[11264, 11264, 11264, 11264, 11264, ...])
    Tag(SAMPLEFORMAT, 3)
    Tag(UNKNOWN(33550), Float64[10.0, 10.0, 0.0])
    Tag(UNKNOWN(33922), Float64[0.0, 0.0, 0.0, 706740.0, 9.34096e6, ...])
    Tag(UNKNOWN(34735), UInt16[1, 1, 0, 7, 1024, ...])
    Tag(UNKNOWN(34737), "WGS 84 / UTM zone 48...")

From the tags in the TIFF file it looks like TiffImages.jl did what it was supposed to. It is a 256x256 single plane image, BUT it looks like the GeoTIFF spec has its own set of tags that store the other "slices" which I think are the spectral bands you are missing.

A brief read over the GeoTIFF spec requirements shows that tag 34735 (which you can see in the above list as UNKNOWN(34735)) aka GeoKeyDirectoryTag should contain all the necessary info about the GeoTIFF bands that are missing.

Now, adding support for GeoTIFFs directly is likely outside of the scope of TiffImages.jl since it's meant to be a base package that can be extended by other packages (e.g. https://github.com/tlnagy/OMETIFF.jl) to add support for special kinds of TIFFs. I believe that the GeoTIFF-specific code would be better suited in such a package. I don't think it would be too much work to hook into TiffImages.jl's current loading logic and just add some extra code to handle the GeoTIFF specific tags.

I would be happy to provide pointers if you wanted to try build something like this, see also the "Extending TiffImages" for some details on the internal types.

Alexander-Barth commented 1 year ago

Thanks a lot for your time looking at this. I agree that this would be better in a separate package. Unfortunately, I am not enough "invested" in the GeoTIFF ecosystem (mostly using NetCDF myself) that I could do this myself.

tlnagy commented 1 year ago

I re-opened this in case someone finds it and wants to take this on!

chrstphrbrns commented 6 months ago

This issue seems to be more about multispectral support, which is not specific to GeoTIFFs

@tlnagy Any thoughts on what to use for the underlying storage (ie, "pixel" type)? Colorant seems insufficient or innapopriate for the general case of arbitrarily many bands of data. NTuple?

Presumably, any image that has "unspecified" extra samples would use this alternative representation

tlnagy commented 6 months ago

@tlnagy Any thoughts on what to use for the underlying storage (ie, "pixel" type)? Colorant seems insufficient or innapopriate for the general case of arbitrarily many bands of data. NTuple?

I'm not the best to judge as I'm not sure how GeoTIFF data is normally accessed and whether it makes sense to load it that way. A potential approach is to design a new lazy or mmapped wrapper type that wraps the standard TiffImages.jl output, reinterprets the Colorants as UInt8s or whatever and lazily collapses the multi-spectral data along the Z axis and reinterprets it to a new type and supports unique access patterns that are best suited for GeoTIFF analysis.

What about something like this where a 4x4 image with 7 spectral bands is read by TiffImages

EDIT: I do realize that currently TiffImages simply doesn't load the GeoTIFF spectral slices, but a hypothetical GeoTIFF.jl package could hook in and force-loading of these slices.

julia> using Colors, FixedPointNumbers

julia> struct MultiSpec{N} # some custom spectral type for a new GeoTIFF.jl package
           slices::NTuple{N, UInt8}
       end

julia> data = rand(Gray{N0f8}, 4, 4, 7); # data from TiffImages.jl's load

julia> perm = PermutedDimsArray(data, (3, 1, 2));

julia> sbands = reinterpret(reshape, MultiSpec{7}, perm)
4×4 reinterpret(reshape, MultiSpec{7}, PermutedDimsArray(::Array{Gray{N0f8},3}, (3, 1, 2))) with eltype MultiSpec{7}:
 MultiSpec{7}((0x48, 0x3e, 0x0b, 0x35, 0x62, 0xbe, 0x9f))  …  MultiSpec{7}((0x13, 0xf3, 0xe9, 0xa0, 0x5e, 0x7d, 0x5d))
 MultiSpec{7}((0x85, 0x3a, 0x16, 0x56, 0xfd, 0x07, 0xa7))     MultiSpec{7}((0x7e, 0x80, 0x39, 0x73, 0x91, 0xdb, 0x8f))
 MultiSpec{7}((0x80, 0x6f, 0x33, 0x6a, 0xda, 0xfb, 0x29))     MultiSpec{7}((0x4b, 0xc7, 0x8b, 0x08, 0xfe, 0xb0, 0x9b))
 MultiSpec{7}((0xc8, 0x86, 0x4a, 0x47, 0x8d, 0xd0, 0xd0))     MultiSpec{7}((0xd0, 0x45, 0xb7, 0x1a, 0xb8, 0xbb, 0x7f))

julia> size(sbands)
(4, 4)

Now, adding support for GeoTIFFs directly is likely outside of the scope of TiffImages.jl since it's meant to be a base package that can be extended by other packages (e.g. https://github.com/tlnagy/OMETIFF.jl) to add support for special kinds of TIFFs. I believe that the GeoTIFF-specific code would be better suited in such a package. I don't think it would be too much work to hook into TiffImages.jl's current loading logic and just add some extra code to handle the GeoTIFF specific tags.

I stand by my thoughts here, but I would be happy to review any changes needed in base TiffImages.jl (a la #139) that are needed for a GeoTIFF.jl package. But I think anything that involves the GeoTIFF specific tags should go into a separate package, but anything that's general should go here.

chrstphrbrns commented 6 months ago

Again, I think the title of this issue is misleading--these extra bands of data are supported by basic TIFF functionality, not something specific to GeoTIFFs. The samples are all packed together in strips as with any other image

TiffImages.jl should be able to load all of the raw data by default, regardless of whether other packages might transform or reinterpret it

Not sure what you mean by lazily collapsing along the z-axis, or about the advantage of wrapping NTuple in a new type (as opposed to, say, a const alias)

tlnagy commented 6 months ago

My original understanding based on this comment https://github.com/tlnagy/TiffImages.jl/issues/100#issuecomment-1489218133 is that GeoTIFF stores the spectral bands using its own set of tags (e.g. 33550, 33922, 34735, etc) which while similar in design to the standard TIFF tags, are actually an orthogonal means of storing data within the TIFF type.

My thinking right now is that loading these would be outside of the scope of the base TiffImages.jl package, but I could be convinced otherwise.

Not sure what you mean by lazily collapsing along the z-axis, or about the advantage of wrapping NTuple in a new type (as opposed to, say, a const alias)

The rational for that design is that you could load data using the standard TiffImages.jl infrastructure that returns a Colorant and present it to the user in a way that is most useful for GeoTIFF users. The MultiSpec type from above was a toy example, but wrapping a NTuple in that manner (similar to how Colorants wrap FixedPointNumbers) allows you to add nice features without committing type piracy.

I do a limited version of this in the OMETIFF.jl package that wraps the standard TiffImages.jl types to have Z, channel, and time dimensions as well as the associated units for each.

chrstphrbrns commented 3 months ago

TiffImages is not reading the data correctly, despite the lack of an error

Here's what I've prototyped:

julia> im=TiffImages.load("S2_1-12-19_48MYU_0.tif");

julia> dump(im[1])
TiffImages.WidePixel{ColorTypes.Gray{Float32}, NTuple{10, Float32}}
  color: ColorTypes.Gray{Float32}
    val: Float32 0.09607481f0
  extra: NTuple{10, Float32}
    1: Float32 0.085751705f0
    2: Float32 0.06403932f0
    3: Float32 0.04786557f0
    4: Float32 0.042687308f0
    5: Float32 0.04126585f0
    6: Float32 0.04205676f0
    7: Float32 0.035995904f0
    8: Float32 0.04031173f0
    9: Float32 0.024570953f0
    10: Float32 0.017973172f0

julia> color(im)    # extract the color from each pixel
256×256 Array{Gray{Float32},2} with eltype ColorTypes.Gray{Float32}:
 Gray{Float32}(0.0960748)  Gray{Float32}(0.0960748) ...
 Gray{Float32}(0.0960748)  Gray{Float32}(0.0960748) ...
 Gray{Float32}(0.0960748)  Gray{Float32}(0.0960748) ...
...

julia> nchannels(im)   # total number of channels (where each color channel is treated separately)
11

julia> channel(im,3)   # extract values from a single channel
256×256 Matrix{Float32}:
0.0640393  0.0659089  0.065689 ...
0.0653591  0.065579   0.0646992 ...
0.0667888  0.0652491  0.0653591 ...
...

WidePixel respects the distinction made by the TIFF spec between the per se color data (as indicated by the PHOTOMETRIC tag) and the "extra" samples

tlnagy commented 3 months ago

Ah, I understand. So the issue is that the Colorant backing type isn't flexible enough to handle 5+ ExtraSamples that GeoTIFF uses? AFAIK Colorant caps out at ~4 with RGBX etc. I would be fine with WidePixel as an intermediate/alternative if TiffImages can't pack in everything into a Colorant. As long as Gray{N0f8} etc images keep their current performance, I would be happy to merge such a PR.