Opening and saving the same WSI with the same size

libvips / pyvips

python binding for libvips using cffi

MIT License

628 stars 49 forks source link

Opening and saving the same WSI with the same size #167

Open EKami opened 4 years ago

EKami commented 4 years ago

Hello, I'm trying to figure out how to open an SVS file, downsizing it, and save it as .tif without having the resulting file being larger than the original one.

I don't want any degradation on the quality of the WSI during the operation and at best I would like to keep the metadata generated by the SVS format into the new .tif format. For example, if I do:

# file is 187.5mb on disk
img = pyvips.Image.openslideload(str(file), level=0)
img = img.resize(0.92) # Downsizing
img.tiffsave(str(file_output), compression='lzw')
# file_output is 828.2mb on disk

Why do I get a jump in size like this? I'd like to use a lossless compression algorithm when I save the new file_output file. Also is the .tif format the best format which can be read by Openslide/libVips when it comes to reading speed/compression ratio?

Thanks a lot!

jcupitt commented 4 years ago

Hello @EKami,

SVS files are usually compressed with jpeg2000, so you'll need to use a lossy compressor.

I would try:

img = pyvips.Image.openslideload(str(file))
img = img.resize(0.92) 
img.tiffsave(str(file_output), compression='jpeg', Q=85, tile=True, properties=True)

The properties argument makes tiffsave write all the metadata to the IMAGEDESCRIPTION tag as XML. It'll need to be a tiled tiff or you'll hit the 64k pixel JPEG limit.

I see:

$ vips copy CMU-1.svs x.tif[compression=jpeg,Q=85,tile,properties]
$ ls -l
total 294072
-rw-r--r-- 1 john john 177552579 Feb 10 20:30 CMU-1.svs
-rw-r--r-- 1 john john 116692123 Mar 31 14:06 x.tif

So reasonably close.

EKami commented 4 years ago

Thank you so much @jcupitt !! I have another question: If I use compression='jpeg', Q=85 wouldn't I loose on image quality on top of the jpeg2000 compression the SVS files have already applied during scanning?

The reason why I really want to go with lossless compression is that my ultimate goal is to be able to convert both Mirax and SVS files under the same format .tif while:

Keeping the metadatas
Downsizing the WSIs
Not loosing on image quality (since those WSIs have to run through a deep learning algorithm and I noticed that it's very sensitive to the changes in image quality, even with compression='jpeg', Q=100).

Thanks a lot!

jcupitt commented 4 years ago

Yes, you'll get extra artefacts from the jpg compression.

I do deep learning directly on the WSI image, would that be an option? You can pull rects from SVS files and pass them to pytorch etc. You don't need to go via a tiff intermediate.

jcupitt commented 4 years ago

Sample code and benchmark: https://github.com/libvips/pyvips/issues/100#issuecomment-493960943

EKami commented 4 years ago

I think that'll probably be the only option for SVS files at this point since they seem to be compressed by a lot with the jpeg2000 format. As for Mirax, I found that I have room to shrink their size since I only need them at downsampling 2.0/level 1 which is 4 times less than the original size.

Thanks a lot for your help @jcupitt , very appreciated :)