Cannot pickle dump pyvips Image

amtsak commented 1 year ago

Is there scope to support pickle dumping for pyvips Images in the future? Currently, it fails - see minimal example to reproduce below:

import pickle
import pyvips

slide_path = "...path/to/slide"
write_file = "...path/to/pickle/file"

slide = pyvips.Image.new_from_file(slide_path)

with open(write_file, 'wb') as file:
    # dump slide
    pickle.dump(slide, file)

Have tried this with .svs, .ndpi and .tif slides.

The error with python 3.10 & pyvips 2.2.1:

ffi.error: struct _VipsObject: wrong size for field 'constructed' (cdef says 1, but C compiler says 4). fix it or use "...;" as the last field in the cdef for struct _VipsObject to make it flexible

The error with python 3.7 & pyvips 2.2.0:

TypeError: can't pickle _cffi_backend._CDataBase objects

Allowing pickle dumping would make pyvips Images compatible with the latest pytorch Dataloaders (>= 1.13.1) that unfortunately pickle dump all their attributes when doing any type of multiprocessing. So it would make life easier for several ML applications.

jcupitt commented 1 year ago

Hi @amtsak,

That's interesting, I've never looked at pickling (obviously). I'll see if I can fix the error.

jcupitt commented 1 year ago

I fixed the obvious issue, but it's still not working. With git master pyvips I now see:

$ python3
Python 3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> import pickle
>>> x = pyvips.Image.new_from_file("CMU-1.svs")
>>> pickle.dump(x, open("x.pickle", "w"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot pickle '_cffi_backend._CDataBase' object
>>>

I guess ffi does not support pickle in a straightforward way.

jcupitt commented 1 year ago

I had a quick read and I don't think pyvips could ever support pickle. An image in pyvips is (behind the scenes) a pointer owned by a huge C library, and represents, not data, but a tree of pointers to functions. There's not really anything that can be portably serialised.

If the images are small, you could maybe convert them to numpy arrays. They are simple data objects and pickle easily.

amtsak commented 1 year ago

Thanks for looking into it @jcupitt! Makes sense that this wouldn't be pickled. Thanks for the quick response either way.

jcupitt commented 1 year ago

I had a thought: perhaps you could just pass filenames into the multiprocessing function and do new_from_file there? It's very fast on slide images, so there's no real benefit to passing the image.

amtsak commented 1 year ago

Thanks @jcupitt, that's one of the first things I tried, but even though it's fast, it does add some 0.03-0.06 secs of overhead (on my setup at least for a 3GB pyramid slide image) which can add up if the volume of region reads is high enough, so I'd rather avoid it. But it could be an option.

jcupitt commented 1 year ago

Yes, you'd need to generate quite a few tiles in the multiprocessing method to overcome the cost of the new_from_file.

NeelKanwal commented 1 year ago

Hi,

I am also trying to apply multiprocessing and extract and save patches in a quicker manner. I am posting issue here since it looks relevant to this converstion.

Note: I am already using fetch method which has improved some speed but it still a lot of time for some bigger files. I have read some other open/close issues but could not find the solution.

Here is the code:

from multiprocessing import  Pool
import os  
import pyvips as vips
from PIL import Image
import time
import matplotlib.pyplot as plt
import numpy as np

location = "/home/...."
file = "ABC.ndpi"
path =  "/home/....."
patch_folder = "/home/......../patches/"
num_workers = 4

def fetch(region, patch_size, x, y):
    return region.fetch(patch_size * x, patch_size * y, patch_size, patch_size)

def create_patches(location, file, path, 
                      patch_folder, workers=1, patch_size=224, mask_overlap= 70.0):

    file_pth = os.path.join(location, file)
    img_400x = vips.Image.new_from_file(file_pth, level=0, autocrop=True).flatten()

    w, h = img_400x.width, img_400x.height
    n_across = int(w/patch_size)
    n_down = int(h/patch_size)
    mask_path = os.path.join(path, "#binary#mask.png")

     # function suppose to parallelize and save patch based on the mask. 
    def extract_and_save_patch(x_cord, y_cord, file_path=file_pth, file_name=file, 
                        mask_path=mask_path,  patch_folder=patch_folder, patch_size=patch_size, 
                        mask_overlap=mask_overlap):

       img_400x = vips.Image.new_from_file(file_path, page=0,
                                                    autocrop=True).flatten()

        w, h = img_400x.width, img_400x.height
        mask_w, mask_h = Image.open(mask_path).size
        resized_mask = vips.Image.new_from_file(mask_path).flatten() \
            .colourspace("b-w").resize(w/mask_w, vscale=h/mask_h)
        img_400x = vips.Region.new(img_400x)
        resized_mask = vips.Region.new(resized_mask)

        patch_mask = fetch(resized_mask, patch_size, x_cord, y_cord)
        patch_mask = np.ndarray(buffer=patch_mask, dtype=np.uint8, shape=[patch_size, patch_size])
        if np.mean(patch_mask/255)*100 > mask_overlap:
            patch = fetch(img_400x, patch_size, x_cord, y_cord)
            patch = np.ndarray(buffer=patch, dtype=np.uint8, shape=[patch_size, patch_size, 3])
            fname = file_name.split(".")[0]

            x_start, y_start = x_cord*patch_size, y_cord*patch_size
            base_name = f"{fname}_{x_start}_{y_start}.png"
            patch_pil = Image.fromarray(patch)
            patch_pil.save(os.path.join(patch_folder, base_name))

    list_cord = [(x, y) for x in range(0, n_across) for y in range(0, n_down)]
    with Pool(processes=workers) as p:
        p.starmap(extract_and_save_patch, list_cord)

create_patches(location, file, path, patch_folder, workers=num_workers)

I am passing paths instead of vips object but still getting the same error. Here is the trace.

Traceback (most recent call last):
  File "multiprocess.py", line 135, in <module>
    create_patches(location, file, path, patch_folder, workers=num_workers)
  File "multiprocess.py", line 127, in create_patches
    p.starmap(extract_and_save_patch, list_cord)
  File "/home/neel/miniconda3/envs/process/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/neel/miniconda3/envs/process/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/neel/miniconda3/envs/process/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/home/neel/miniconda3/envs/process/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/neel/miniconda3/envs/process/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'create_patches.<locals>.extract_and_save_patch'

Do you have any suggestions on this or how I can parallilize this through pyvips.

Neel

jcupitt commented 1 year ago

There's an example using multiprocessing here:

https://github.com/libvips/pyvips/blob/master/examples/thumb-dir.py

Does that help?

I'll have a look at your code.

jcupitt commented 1 year ago

Ah your code fails because you have a local function. You can only use global functions with pickle.

Try making extract_and_save_patch into a top level definition.

NeelKanwal commented 1 year ago

Thank @jcupitt for quick reply. :)

I tried to define the function on top but passing many arguments was creating some serialization error.

I solved the problem by writting "global extract_and_save_patch" before function definition inside create_patches.

It reduced the patching time signifiantly using 16 CPUs. but I was wondering if can reduce the overhead of loading file and mask everytime (and resizing it everytime). I wish there was a way to pass pyvips object directly to function.

jcupitt commented 1 year ago

I'd swap this code:

        mask_w, mask_h = Image.open(mask_path).size
        resized_mask = vips.Image.new_from_file(mask_path).flatten() \
            .colourspace("b-w").resize(w/mask_w, vscale=h/mask_h)

For vips.thumbnail(), it should be much faster. What format are your mask images in?

What are you writing? Just a simple RGB array? You can use rawsave to write these from pyvips.

Crop would probably be faster than fetch if your patches are 250x250.

NeelKanwal commented 1 year ago

Masks are grayscale images saved in png format. These masks are smaller than original image to make it easy to view. These masks correspond to tissue which will only be used for creating patches.

I am extracting RGB patches of 224*224, to run a ML model over them. I will swtich and test crop to compare.

NeelKanwal commented 1 year ago

I tried to test the program on a selected WSI and mask with 16 cpu workers.

Using fetch, it takes 26.4 seconds for extracting and saving 1830 tiles of 224x224. Using crop, it takes 21.6 seconds for extracting and saving 1830 tiles of 224x224.

It looks like fetch works better for ndpi pyramid images in my case.

Thanks a lot.

jcupitt commented 1 year ago

You could probably make that a lot faster. Post a sample mask image and the size of the ndpi you are benchmarking with and I'll have a go at tuning it.

NeelKanwal commented 1 year ago

Thanks John. :)

The ndpi WSI is around 90 MB and I want to use 40x here. Here is an example png mask.

jcupitt commented 1 year ago

I made a version of your original code for testing:

#!/usr/bin/python3

from multiprocessing import  Pool
import os
import pyvips as vips
from PIL import Image
import time
import matplotlib.pyplot as plt
import numpy as np
import sys

def fetch(region, patch_size, x, y):
    return region.fetch(patch_size * x, patch_size * y, patch_size, patch_size)

# function suppose to parallelize and save patch based on the mask.
def extract_and_save_patch(x_cord, y_cord,
                           slide_filename, mask_filename,
                           patch_size, mask_overlap):
    slide = vips.Image.new_from_file(slide_filename, 
                                     autocrop=True).flatten()

    w, h = slide.width, slide.height
    mask_w, mask_h = Image.open(mask_filename).size
    resized_mask = vips.Image.new_from_file(mask_filename).flatten() \
        .colourspace("b-w").resize(w / mask_w, vscale=h / mask_h)
    img_400x = vips.Region.new(slide)
    resized_mask = vips.Region.new(resized_mask)

    patch_mask = fetch(resized_mask, patch_size, x_cord, y_cord)
    patch_mask = np.ndarray(buffer=patch_mask, dtype=np.uint8, shape=[patch_size, patch_size])
    if np.mean(patch_mask / 255) * 100 > mask_overlap:
        patch = fetch(img_400x, patch_size, x_cord, y_cord)
        patch = np.ndarray(buffer=patch, dtype=np.uint8, shape=[patch_size, patch_size, 3])
        x_start, y_start = x_cord * patch_size, y_cord * patch_size
        patch_pil = Image.fromarray(patch)
        patch_pil.save(f"patches/{x_start}_{y_start}.png")

slide_filename = sys.argv[1]
patch_size = 224

img_400x = vips.Image.new_from_file(slide_filename, autocrop=True)
w, h = img_400x.width, img_400x.height
n_across = int(w / patch_size)
n_down = int(h / patch_size)

params = [(x, y, slide_filename, "binary-mask.png", patch_size, 70) 
          for x in range(0, n_across) for y in range(0, n_down)]
with Pool(processes=4) as p:
    p.starmap(extract_and_save_patch, params)

I changed it slightly, but the runtime should be the same. On this PC I see:

$ time ./mask.py ~/pics/openslide/CMU-1.ndpi
real    0m43.588s
user    2m35.253s
sys 0m14.278s
$ ls patches/ | wc
   6754    6754   73361

So about 157 patches per second. I tried tweaking it a bit:

#!/usr/bin/python3

from multiprocessing import  Pool
import pyvips as vips
import sys

def extract_and_save_patch(x, y, 
                           slide_filename, mask_filename,
                           patch_size, mask_overlap):
    slide = vips.Image.new_from_file(slide_filename, autocrop=True, rgb=True)
    mask = vips.Image.new_from_file(mask_filename)
    mask = mask.resize(slide.width / mask.width, vscale=slide.height /
                       mask.height, kernel="nearest")

    patch_mask = mask.crop(x * patch_size, y * patch_size, 
                           patch_size, patch_size)
    if patch_mask.avg() / 2.55 > mask_overlap:
        patch = slide.crop(x * patch_size, y * patch_size,
                           patch_size, patch_size)
        patch.write_to_file(f"patches/{x}_{y}.png")

slide_filename = sys.argv[1]
patch_size = 224

slide = vips.Image.new_from_file(slide_filename, autocrop=True)
n_across = int(slide.width / patch_size)
n_down = int(slide.height / patch_size)

params = [(x, y, slide_filename, "binary-mask-mono.png", patch_size, 70)
          for x in range(0, n_across) for y in range(0, n_down)]
with Pool(processes=4) as p:
    p.starmap(extract_and_save_patch, params)

That runs in 34s (199 patches/second). If I set processes=32 (this PC has 16 cores) I get 10s (675 patches/second).

jcupitt commented 1 year ago

Oh, if I put the x loop inside the thread, like this:

#!/usr/bin/python3

from multiprocessing import  Pool
import pyvips as vips
import sys

def extract_and_save_patch(y, 
                           slide_filename, mask_filename,
                           patch_size, mask_overlap):
    slide = vips.Image.new_from_file(slide_filename, autocrop=True, rgb=True)
    mask = vips.Image.new_from_file(mask_filename)
    mask = mask.resize(slide.width / mask.width, vscale=slide.height /
                       mask.height, kernel="nearest")

    n_across = int(slide.width / patch_size)
    for x in range(n_across):
        patch_mask = mask.crop(x * patch_size, y * patch_size,
                               patch_size, patch_size)
        if patch_mask.avg() / 2.55 > mask_overlap:
            patch = slide.crop(x * patch_size, y * patch_size,
                               patch_size, patch_size)
            patch.write_to_file(f"patches/{x}_{y}.png")

slide_filename = sys.argv[1]
patch_size = 224

slide = vips.Image.new_from_file(slide_filename, autocrop=True)
n_down = int(slide.height / patch_size)

params = [(y, slide_filename, "binary-mask-mono.png", patch_size, 70) 
          for y in range(0, n_down)]
with Pool(processes=4) as p:
    p.starmap(extract_and_save_patch, params)

I get 18s (375 patches/second). If I also set processes=32 I get 4.5s (1500 patches/second).

NeelKanwal commented 1 year ago

Thanks a lot, Now it a more faster (with last code, loop inside the thread)

Although, it does not work that faster (compared to yours) on my linux server but total time now reduced

with 32 CPUS: it takes 15.6s, ultimately giving 118 patches/second. with 16 CPUs: it takes 19.2s, giving 96 patches/second

The only change in my version is rgb=True in vips.Image.new_from_file gives vips.Error may be due to a different version of vips. So I am using flatten() instead. I hope that wont be the bigger issues causing overhead. I am using pyvips 2.2.0

jcupitt commented 1 year ago

You need 8.14 for the rgb mode.

libvips / pyvips

Cannot pickle dump pyvips Image #378