tlnagy / TiffImages.jl

πŸ’Ž Pure-Julia TIFF I/O with a focus on correctness 🧐
http://tamasnagy.com/TiffImages.jl/
MIT License
54 stars 13 forks source link

Speed Issue loading large .tiff images #167

Open Gioosu opened 4 months ago

Gioosu commented 4 months ago

Hi,

I need to load large images, each one larger than 2GB. The normal loading time is approximately 9 minutes, but I suppose I could achieve better performance. Reading some other posts, I found this: #79.

I also read something about Garbage Collector overhead, but deactivating it while loading images didn't improve the performance.

So, I tried using load(tiff_path; lazyio = true), but I encountered this error:

ERROR: Unable to mutate inplace since this array is on disk. Convert to a mutable in-memory version by running `copy(arr)`. 

π—‘π—Όπ˜π—²: For large files this can be quite expensive. A future PR will add support for writing inplace to disk. See `push!` for appending to an array.

Here is a minimal code example to reproduce the issue:


# Path to the large image
tiff_path = "path/to/large_image.tiff"

# Attempt to load the image with lazy IO
try
    img = load(tiff_path; lazyio = true)
catch e
    println("Error: ", e)
end

Am I getting something wrong? Did I misunderstand how to use lazyio properly?

Any advice or suggestions would be greatly appreciated.

tlnagy commented 4 months ago

9 minutes is pretty damn slow, especially if you're using lazy loading. Could you give me more information about the TIFF you are loading? I wonder if you're hitting some weird edge case. Could you give me the output of

julia> first(ifds(img))

Comparison

For example, this is loading a 4 GB TIFF right after updating TiffImages in the Julia REPL (i.e. worse case scenario for pre-compilation):

julia> @time img = TiffImages.load("dish1_1_MMStack.ome.tif")
Loading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| Time: 0:00:14
 30.511296 seconds (2.06 M allocations: 4.166 GiB, 0.86% gc time, 5.60% compilation time)
1024Γ—1024Γ—2026 TiffImages.DenseTaggedImage{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3, UInt32, Array{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3}}:
...

Actual loading time 14 seconds. Full time with precompilation and everything 30 seconds.

For mmap and lazyio:

julia> @time img = TiffImages.load("dish1_1_MMStack.ome.tif"; mmap = true)
  0.402482 seconds (345.09 k allocations: 95.885 MiB, 36.08% gc time, 10.73% compilation time)
1024Γ—1024Γ—2026 TiffImages.MmappedTIFF{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3} with 2026 slice planes:
...

julia> @time img = TiffImages.load("dish1_1_MMStack.ome.tif"; lazyio = true)
  0.920757 seconds (328.37 k allocations: 101.070 MiB, 51.71% gc time)
1024Γ—1024Γ—2026 TiffImages.DenseTaggedImage{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3, UInt32, LazyBufferedTIFF{ColorTypes.Gray{FixedPointNumbers.N0f16}, UInt32, Matrix{ColorTypes.Gray{FixedPointNumbers.N0f16}}}}:
...

Sub 1 second for both of these.

Gioosu commented 4 months ago

Hello,

I understood what was happening by using @time. You’re actually right because my .tiff image is 2GB compressed but 42GB uncompressed, so the load time MUST obviously be slower. I didn’t realize it could be THAT much larger. Thanks for your time.

tlnagy commented 4 months ago

It's still slower than I would like. What's the output of

julia> first(ifds(img))

and

julia> typeof(img)
Gioosu commented 4 months ago
julia> @time img = TiffImages.load("/Users/gioosu/Documents/SOPHYSM-Workspace/test.tiff")
Loading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| Time: 0:06:29
[ Info: Array too large to fit in standard TIFF container, switching to BigTIFF
423.952919 seconds (14.23 M allocations: 45.698 GiB, 10.50% gc time, 2.19% compilation time: 1% of which was recompilation)
9515Γ—10004Γ—54 TiffImages.DenseTaggedImage{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3, UInt64, Array{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3}}:

[...] I removed the image matrix to improve readability.

julia> first(ifds(img))
IFD, with tags: 
    Tag(IMAGEWIDTH, 10004)
    Tag(IMAGELENGTH, 9515)
    Tag(BITSPERSAMPLE, 16)
    Tag(COMPRESSION, COMPRESSION_ADOBE_DEFLATE)
    Tag(PHOTOMETRIC, 1)
    Tag(IMAGEDESCRIPTION, "<?xml version="1.0" ...")
    Tag(SAMPLESPERPIXEL, 1)
    Tag(SOFTWARE, "OME Bio-Formats 6.4....")
    Tag(TILEWIDTH, 512)
    Tag(TILELENGTH, 512)
    Tag(TILEOFFSETS, UInt64[16, 320683, 968212, 642922, 1290341, ...])
    Tag(TILEBYTECOUNTS, UInt64[320667, 322239, 322129, 325290, 360927, ...])
    Tag(SUBIFD, UInt64[2390962148, 2390964066, 2390964784, 2390965246, 2390965628, ...])
    Tag(SAMPLEFORMAT, 1)

julia> typeof(img)
TiffImages.DenseTaggedImage{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3, UInt64, Array{ColorTypes.Gray{FixedPointNumbers.N0f16}, 3}}

I downloaded this .tiff from an online source as a test, but I need to make it work with HubMaps

Gioosu commented 3 months ago

any news? @tlnagy @marcoxa