libvips / pyvips

python binding for libvips using cffi
MIT License
649 stars 50 forks source link

from openslide to numpy #409

Open raphaelbourgade opened 1 year ago

raphaelbourgade commented 1 year ago

Hi @jcupitt, thank you for pyvips !

What is the faster way to convert a WSI (ndpi, svs...) into a numpy array ?

When I try : image= pyvips.Image.openslideload('slide.ndpi') native_image.numpy()

I have an error : "Error: no such operation numpy VipsOperation: class "numpy" not found"

I've the pyvips version 2.2.1...

What do you think the problem is?

Thank you !

jcupitt commented 1 year ago

Hi @raphaelbourgade,

What's native_image? Try just:

 python
Python 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> x = pyvips.Image.new_from_file("CMU-1.ndpi")
>>> x.width
51200
>>> x.height
38144
>>> x.numpy()
array([[[202, 192, 180, 255],
        [203, 193, 181, 255],
        [204, 194, 182, 255],
        ...,
        [204, 195, 180, 255],
        [204, 195, 180, 255],
        [204, 195, 180, 255]],

       ...,

       [[205, 196, 181, 255],
        [205, 196, 181, 255],
        [205, 196, 181, 255],
        ...,
        [203, 194, 179, 255],
        [203, 194, 179, 255],
        [203, 194, 179, 255]]], dtype=uint8)
>>> 

But this is probably not a great way to work -- it will use a HUGE amount of memory (8GB of RAM for this small slide, for example, but 80GB or more would be more typical), and the conversion will be slow.

Is this for ML training? I would crop out small pieces and feed them to your ML system in chunks. You'll get much better performance.

Perhaps:

import pyvips

tile_size = 512
# only fetch RGB (not RGBA) from the slide
image = pyvips.Image.new_from_file("test-slide.ndpi", rgb=True)

# cut into tiles
tiles = [image.crop(x, y, tile_size, tile_size)
         for y in range(0, image.height, tile_size)
         for x in range(0, image.width, tile_size)]

for tile in tiles:
    numpy_array = tile.numpy()
    process_tile(tile)

If your tiles are small (eg. 32x32 pixels), you might be better off with fetch.

raphaelbourgade commented 1 year ago

Thank you very much for your quick and detailed reply @jcupitt.

"Native_image" is a copy-paste error and corresponds to "image" in my script, but even with your script, I still have the same error :

"Error: no such operation numpy VipsOperation: class "numpy" not found"

I think it is an error related to pyvips installation ?

jcupitt commented 1 year ago

Then I guess you are somehow picking up an old pyvips version at runtime.

I see:

>>> import pyvips
>>> pyvips.__version__
'2.2.1'
>>> 
raphaelbourgade commented 1 year ago

Yes, it is actually my version...

jcupitt commented 1 year ago

Yes, I understand, but could you be picking up the wrong version at runtime?

The numpy() method was added in version 2.2, so perhaps due to some path mixup you are somehow loading an old version? Try printing the value of pyvips.__version__ just before you call image.numpy().

jcupitt commented 1 year ago

I see:

$ python
Python 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> pyvips.__version__
'2.2.1'
>>> x = pyvips.Image.black(1,1)
>>> 'numpy' in dir(x)
True
>>> 

It's a simple method on the image class, so I think you must be running the wrong version of pyvips.