vqdang / hover_net

Simultaneous Nuclear Instance Segmentation and Classification in H&E Histology Images.
MIT License
524 stars 220 forks source link

Create binary image from output json #152

Closed snigdhaAgarwal closed 3 years ago

snigdhaAgarwal commented 3 years ago

I need the binary version of a large svs format cell image. I am using run_infer.py because it generates a mask. However the output in mask is a binary image at the maximum zoomed out level whereas I need a binary for the maximum "zoomed in" level. The other option I'm exploring is to use the output json and use contour information to build a binary image. Is there some other way to directly get a binary? If not, is there a possibility of there being a feature in the future?

I also tried modifying the infer/wsi.py code to produce a mask for each magnification level as seen below:

    for mag in self.wsi_handler.metadata["available_mag"]:
            print("Processing for mag ",str(mag))
            # if msk_path is not None and os.path.isfile(msk_path):
            #     self.wsi_mask = cv2.imread(msk_path)
            #     self.wsi_mask = cv2.cvtColor(self.wsi_mask, cv2.COLOR_BGR2GRAY)
            #     self.wsi_mask[self.wsi_mask > 0] = 1

            log_info(
                "WARNING: No mask found, generating mask via thresholding at " + str(mag) + "!"
            )

            from skimage import morphology

            # simple method to extract tissue regions using intensity thresholding and morphological operations
            def simple_get_mask(mag):
                wsi_thumb_rgb = self.wsi_handler.get_full_img(read_mag=mag)
                gray = cv2.cvtColor(wsi_thumb_rgb, cv2.COLOR_RGB2GRAY)
                _, mask = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
                # mask = morphology.remove_small_objects(
                #     mask == 0, min_size=16 * 16, connectivity=2
                # )
                # mask = morphology.remove_small_holes(mask, area_threshold=128 * 128)
                mask = morphology.binary_dilation(mask, morphology.disk(16))
                return mask

            self.wsi_mask = np.array(simple_get_mask(mag), dtype=np.uint8)
            # self.wsi_mask = np.array(simple_get_mask(mag) > 0, dtype=np.uint8)
            print(self.wsi_mask.size,np.sum(self.wsi_mask))
            print(self.wsi_mask.size == np.sum(self.wsi_mask))
            if np.sum(self.wsi_mask) == 0:
                log_info("Skip due to empty mask!")
                return
            if self.save_mask:
                rand_multi = self.wsi_mask * 255
                print(255 * rand_multi.size == np.sum(rand_multi))
                print(rand_multi.size,np.sum(rand_multi))
                cv2.imwrite("%s/mask/%s_mag%d.png" % (output_dir, wsi_name,mag), rand_multi)
                print("Saved mask for mag ",str(mag))

Though the binaries for the 3 lower magnification levels look good, the one for magnification 40 is completely black(after removing the remove_small_holes and remove_small_objects function calls) or completely white. Basically cells are not being identified. Can you point out what I'm doing wrong here or as the comment says this piece of code is meant for generating tissue level binaries only?

simongraham commented 3 years ago

Hi,

So the mask that is generated during inference is the tissue mask. You shouldn't try and modify that to get the binary cell output. Like you already suggested, you should be able to use the json file output to binarise the image if you wish. Everything will be there that you need. However, you should note that saving the image at 40x will take up a lot of memory and there may be issues. This is why we have saved the output as json. Alternatively you can process image tiles and then binarisation may be a bit more straight forward. Let me know if you need more assistance.

snigdhaAgarwal commented 3 years ago

Thanks for the prompt reply! I used the tutorial here: https://github.com/vqdang/hover_net/blob/master/examples/usage.ipynb to get the binary file. The code I used is below for anyone who has a similar use case.

from misc.viz_utils import visualize_instances_dict
import numpy as np
import json
from skimage import io
import cv2
import zarr

f = open('/mnt/ibm_sm/home/snigdha/TSP14 UB B2.json’)
data = json.load(f)
type_info = {"0" : ["nolabe", [255,255,255]]} # all white

for i in data['nuc']:
    contour = data['nuc'][str(i)]['contour']
    x_s = [a_tuple[0] for a_tuple in contour]
    y_s = [a_tuple[1] for a_tuple in contour]
    coords = np.empty((len(x_s),2))
    coords[:,0] = x_s
    coords[:,1] = y_s
    tile_info_dict[i] = {'contour': np.int0(coords), 'type':'0’}

# shape of the max mag level in svs
r_mask = np.zeros((67343, 143424), np.uint8)
overlaid_output = visualize_instances_dict(r_mask, tile_info_dict, type_colour=type_info,line_thickness=cv2.FILLED)
zarr.save('example.zarr',overlaid_output)

Saving as a png was taking a lot of time and it was also very huge. So I saved it as a zarr instead and this really helped me save space and time!