libvips / pyvips

python binding for libvips using cffi
MIT License
617 stars 50 forks source link

`vips_tracked: out of memory -- size == 59546MB` When Converting MRXS to DZI #482

Closed MasiCal354 closed 3 weeks ago

MasiCal354 commented 3 weeks ago

I wanted to convert a big MRXS file to DZI

import pyvips

image = pyvips.new_from_file("path/to/mirax.mrxs", access="sequential", memory=True)
image.dzsave("path/for/deepzoom", tile_size=1024, suffix=".png")

Above code raises an error vips_tracked: out of memory -- size == 59546MB. I was running this code on AWS ECS with FARGATE compute configured to have 16 vCPU and 120GB of RAM.

I want to keep memory=True to keep up the good performance. The image is actually on 255MB in MRXS format, but the image dimension is actually pretty huge. But I was expecting libvips to handle the tiling by image slices not loading the whole image to the memory.

jcupitt commented 3 weeks ago

Hi @MasiCal354,

Try:

image = pyvips.new_from_file("path/to/mirax.mrxs", rgb=True, autocrop=True)
image.dzsave("path/for/deepzoom", tile_size=1024, suffix=".png")

It should be fast and need little memory.

The rgb tag makes it discard the alpha channel on read, which saves quite a lot of time. I don't think the openslide alpha channel carries much information.

autocrop makes it discard pixels outside the specimen area. By default, MRXS images include the entire slide area, which can be mostly empty.

Are you certain your need PNG? It will be very slow, need a HUGE amount of disc, and will not improve quality. JPEG tiles with Q=85 should be much faster, smaller, and have equivalent quality.

1024x1024 tiles are very large and will usually give poor interactive performance. I would use 512x512 at most.

I tried:

$ vipsheader D23001205_26-10-2023_09-53-58.mrxs
D23001205_26-10-2023_09-53-58.mrxs: 271950x294038 uchar, 4 bands, srgb, openslideload
$ vipsheader D23001205_26-10-2023_09-53-58.mrxs[autocrop]
D23001205_26-10-2023_09-53-58.mrxs: 86102x147458 uchar, 4 bands, srgb, openslideload

Then:

$ /usr/bin/time -f %M:%e vips dzsave D23001205_26-10-2023_09-53-58.mrxs[rgb,autocrop] x
840124:206.61
$ ls -R x_files/ | wc
 263152  263133 3234651

So it converts a 90,000 x 150,000 pixel slide in 3m30s and needs 850mb of memory.

MasiCal354 commented 3 weeks ago

Awesome, I'll try that out. As for the tile size, I'm actually still experimenting with different sizes, as we also use the tiles to parallelize ML inference (unfortunately without GPU). Smaller tile size makes more tiles which makes the ML inference quite overloaded. As for file format, I never think of it actually, but as you mentioned it, I think it'll worth a shot to change to JPEG. Thanks a lot @jcupitt, I'll try them out and close this ticket if it's working well.

MasiCal354 commented 3 weeks ago

@jcupitt, This is great, it takes less memory and it's significantly actually faster. Now I'm gonna need to reduce the resource allocation as I'm definitely overprovisioned now. I'll close this issue. Thanks.