Extracting particular label patches

Suvi-dha commented 6 years ago

Hi, I want to extract only 'normal' label patches and form a lmdb database with just normal label patches. I want to extract and save highest resolution (level 16) patches but my system is going out of memory at level 16 and since, I want to save only normal level, so it should not be a problem with memory if I could just do that. Please, help me where should I make the changes in the code.

Also, I am confused about the set_id parameter. What does it signify? is it the image number from which the patches are extracted or something else. Suppose if i want to save patches image wise in separate folders, so if set_id is 0, all the patches in that set is from image 1?

ysbecca commented 6 years ago

If you want to extract only normal patches, then the fastest way to do it is to add an if/else clause in the patch extraction function. That's in patch_reader.py around line 115. For example


if np.shape(new_tile) == (patch_size, patch_size, 3):
    label = generate_label(regions, region_labels, converted_coords, label_map)
    # x would be the label you want
    if label == x: 
        patches.append(new_tile)
        # continue to add meta data
`

ysbecca commented 6 years ago

Suppose you have n images. Then if you want the total number of sets to be 10, then set_id=0 will take every 10th image starting from the 0th image. So, image 0, 10, 20, 30....

set_id=1 will take images 1, 11, 21, 31, etc. Note that you can also specify your custom set by passing in a boolean selection array. See Jupyter notebook on how to do that.

Suvi-dha commented 6 years ago

I have only 10 WSI images. And if I want to save the patches (of specific label) from all WSIs. then how would I read my dataset so that each WSI image has its own folder containing specific label patches. I am trying to save it like this. But, I am still confused about set-id and valid set id parameters.

from PIL import Image
import os
data=dataset.valid.images
labels=dataset.valid.image_cls
dir='D:/SUVIDHA/ICIAR-BACH-2018/WSI/ImageofEachRegion/scale0/normal/';
os.mkdir(dir+'A03')
newdir=dir+'A03/'
for i,image in enumerate(data):
    print(image.shape,labels[i])
    if labels[i]==0:
        im=Image.fromarray(image)        
        im.save(newdir+str(i)+".jpg")

reading database as


dataset = ds.read_datasets(turtle,
                set_id=0,
                valid_id=0,
                total_sets=10,
                shuffle_all=True,
                augment=False)

So, by n images you mean n WSI images or n patches in one WSI image.

What I understood earlier was that a set is the set of image patches from one WSI , so for me total sets would be 10 as I have 10 WSIs.

Please correct if I am wrong.

Suvi-dha commented 6 years ago

I am getting this error when I try to read dataset

dataset = ds.read_datasets(turtle,
                set_id=1,
                valid_id=1,
                total_sets=5,
                shuffle_all=True,
                augment=False)

AttributeError                            Traceback (most recent call last)
<ipython-input-9-a91649cd5e79> in <module>()
      4                 total_sets=5,
      5                 shuffle_all=True,
----> 6                 augment=False)

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in read_datasets(turtle, set_id, valid_id, total_sets, shuffle_all, augment, is_test)
    151                 dataset.test = fetch_dataset(turtle, -1, -1, False)
    152         else:
--> 153                 dataset.train = fetch_dataset(turtle, set_id, total_sets, augment)
    154                 dataset.valid = fetch_dataset(turtle, valid_id, total_sets, augment)
    155                 if shuffle_all:

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in fetch_dataset(turtle, set_id, total_sets, augment)
    116   if set_id > -1:
    117     items = turtle.get_set_patches(set_id, total_sets)
--> 118     patches = [i.get_patch() for i in items]
    119     coords = [i.coords for i in items]
    120     classes = [i.label for i in items]

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in <listcomp>(.0)
    116   if set_id > -1:
    117     items = turtle.get_set_patches(set_id, total_sets)
--> 118     patches = [i.get_patch() for i in items]
    119     coords = [i.coords for i in items]
    120     classes = [i.label for i in items]

AttributeError: 'NoneType' object has no attribute 'get_patch'

ysbecca commented 6 years ago

What I understood earlier was that a set is the set of image patches from one WSI , so for me total sets would be 10 as I have 10 WSIs.

No, a set is composed of the ith image of each set. If you have 10 WSI and 10 sets, then each set has 1 WSI. But if you have 20 WSI and 10 sets, then each set has 2 WSI. py-wsi was built assuming very large datasets which are usually necessary for research using WSI.

Your error is because you don't have any patches saved in your database.

Suvi-dha commented 6 years ago

but, I am saving patches of normal label as you suggested in your answer above. And when I print the number of items using len(items) I am receiving a significant value ~20000. So, number of items is not empty.

ysbecca commented 6 years ago

Hmm. It's hard for me to tell without seeing all the code but I would suggest debugging from inside the get_set_patches function. Include lots of print statements so you know exactly what's happening. Also, what is np.shape(items)? What datatype is items?

Suvi-dha commented 6 years ago

I have to put checks to eliminate NoneType error at several places, finally got it working. Thank you for your input.

ysbecca / py-wsi

Extracting particular label patches #8