libvips / pyvips

python binding for libvips using cffi
MIT License
649 stars 50 forks source link

how to take advantage of pyvips speed in conjuction with a deep model #427

Open Rasaa84 opened 1 year ago

Rasaa84 commented 1 year ago

Hi,

I am currently working on whole slide histology images, and my workflow involves several steps. First, I divide these images into tiles (grid base). Next, I determine whether each tile (patch) contains tissue or not. If a tile includes tissue, I then pass it through a deep classifier. Finally, I generate a mask that visually represents the probability of each tile belonging to a specific class. This is part of my code:

image = pyvips.Image.new_from_file(img_path)
if image.hasalpha():
    image = image[0:3]

n_across = image.width // patch_size 
n_down = image.height // patch_size

# Cropping the input image such that the length and width are multiples of the tile size
margin_w = (image.width - (image.width // patch_size) * patch_size) // 2
margin_h = (image.height - (image.height // patch_size) * patch_size) // 2
image_w = (image.width // patch_size) * patch_size
image_h = (image.height //patch_size) * patch_size

for y in range(0, n_down):
     for x in range(0, n_across):

            patch = image.crop(margin_w + (x * patch_size), margin_h + (y * patch_size), patch_size, patch_size)
            patch = patch.numpy()

            if np.mean(patch) < 227 and np.std(patch) > 24: #check if it include tissue
               probability_mask[y,x] = predict_propability(patch)

This takes a lot even when I use batch of tiles for processing. Are there more efficient methods I can employ to leverage the speed benefits provided by pyvips?

jcupitt commented 1 year ago

Hi @Rasaa84, sure, there are lots of speedup tricks.

  1. For tissue / background classification, I would test a low-res pyramid level, perhaps using a simple threshold, then use that mask to select tiles from the full resolution image.
  2. I would generate derivative images in pyvips as well (flip, rotate, etc).
  3. It depends on your tile size, but for small tiles (32 x 32 pixels? it depends on your PC) you'll find fetch is probably faster than crop.
  4. Use the rgb flag to new_from_file.

But you'd need to make a complete, runnable, standalone benchmark before any tuning could be done.

Did I point you to this sample code? I can't remember: https://github.com/libvips/pyvips/issues/100#issuecomment-493960943

Rasaa84 commented 1 year ago

Thanks for your response.

After tissue/background classification using a simple threshold, how can I create high resolution patches only from tissue regions and also keep track of tiles locations (like x,y coordinates of center of each tile)? Because at the end of the day I need to create an overall probability mask for whole image.

Rasaa84 commented 9 months ago

Hi @jcupitt ,

I am working on a whole slide image analysis pipeline where I have generated a mask highlighting tissue regions:

image = pyvips.Image.new_from_file(img_path)
slide_gray = slide.colourspace('b-w')
mask = (slide_gray.gaussblur(5) > 200).ifthenelse(0, 1)

My next task involves extracting only the tissue tiles from these highlighted regions. How should I proceed with this?

Thanks in advance for your help.

jcupitt commented 9 months ago

It depends, I'd experiment. You could try generating all mask tiles, testing for != 0, and only generating those image tiles? That'd be simple.

If you want offline tile generation you could try using the skip-blanks feature of dzsave.

Rasaa84 commented 9 months ago

I used image.dzsave(os.path.join(out_dir, 'slide'), tile_size=1024, skip_blanks=30) with different values of skip_blanks but it creates all tiles even from background. Am I missing something?