ysbecca / py-wsi

Python package for dealing with whole slide images (.svs) for machine learning, particularly for fast prototyping. Includes patch sampling and storing using OpenSlide. Patches may be stored in LMDB, HDF5 files, or to disk. It is highly recommended to fork and download this repository so that personal customisations can be made for your work.
https://ysbecca.github.io/programming/2018/05/22/py-wsi.html
GNU General Public License v3.0
161 stars 93 forks source link

Extracting particular label patches #8

Closed Suvi-dha closed 6 years ago

Suvi-dha commented 6 years ago

Hi, I want to extract only 'normal' label patches and form a lmdb database with just normal label patches. I want to extract and save highest resolution (level 16) patches but my system is going out of memory at level 16 and since, I want to save only normal level, so it should not be a problem with memory if I could just do that. Please, help me where should I make the changes in the code.

Also, I am confused about the set_id parameter. What does it signify? is it the image number from which the patches are extracted or something else. Suppose if i want to save patches image wise in separate folders, so if set_id is 0, all the patches in that set is from image 1?

ysbecca commented 6 years ago

If you want to extract only normal patches, then the fastest way to do it is to add an if/else clause in the patch extraction function. That's in patch_reader.py around line 115. For example


if np.shape(new_tile) == (patch_size, patch_size, 3):
    label = generate_label(regions, region_labels, converted_coords, label_map)
    # x would be the label you want
    if label == x: 
        patches.append(new_tile)
        # continue to add meta data
`
ysbecca commented 6 years ago

Suppose you have n images. Then if you want the total number of sets to be 10, then set_id=0 will take every 10th image starting from the 0th image. So, image 0, 10, 20, 30....

set_id=1 will take images 1, 11, 21, 31, etc. Note that you can also specify your custom set by passing in a boolean selection array. See Jupyter notebook on how to do that.

Suvi-dha commented 6 years ago

I have only 10 WSI images. And if I want to save the patches (of specific label) from all WSIs. then how would I read my dataset so that each WSI image has its own folder containing specific label patches. I am trying to save it like this. But, I am still confused about set-id and valid set id parameters.

from PIL import Image
import os
data=dataset.valid.images
labels=dataset.valid.image_cls
dir='D:/SUVIDHA/ICIAR-BACH-2018/WSI/ImageofEachRegion/scale0/normal/';
os.mkdir(dir+'A03')
newdir=dir+'A03/'
for i,image in enumerate(data):
    print(image.shape,labels[i])
    if labels[i]==0:
        im=Image.fromarray(image)        
        im.save(newdir+str(i)+".jpg")

reading database as


dataset = ds.read_datasets(turtle,
                set_id=0,
                valid_id=0,
                total_sets=10,
                shuffle_all=True,
                augment=False)

So, by n images you mean n WSI images or n patches in one WSI image.

What I understood earlier was that a set is the set of image patches from one WSI , so for me total sets would be 10 as I have 10 WSIs.

Please correct if I am wrong.

Suvi-dha commented 6 years ago

I am getting this error when I try to read dataset

dataset = ds.read_datasets(turtle,
                set_id=1,
                valid_id=1,
                total_sets=5,
                shuffle_all=True,
                augment=False)
AttributeError                            Traceback (most recent call last)
<ipython-input-9-a91649cd5e79> in <module>()
      4                 total_sets=5,
      5                 shuffle_all=True,
----> 6                 augment=False)

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in read_datasets(turtle, set_id, valid_id, total_sets, shuffle_all, augment, is_test)
    151                 dataset.test = fetch_dataset(turtle, -1, -1, False)
    152         else:
--> 153                 dataset.train = fetch_dataset(turtle, set_id, total_sets, augment)
    154                 dataset.valid = fetch_dataset(turtle, valid_id, total_sets, augment)
    155                 if shuffle_all:

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in fetch_dataset(turtle, set_id, total_sets, augment)
    116   if set_id > -1:
    117     items = turtle.get_set_patches(set_id, total_sets)
--> 118     patches = [i.get_patch() for i in items]
    119     coords = [i.coords for i in items]
    120     classes = [i.label for i in items]

D:\SUVIDHA\ICIAR-BACH-2018\py-wsi-master\py_wsi\dataset.py in <listcomp>(.0)
    116   if set_id > -1:
    117     items = turtle.get_set_patches(set_id, total_sets)
--> 118     patches = [i.get_patch() for i in items]
    119     coords = [i.coords for i in items]
    120     classes = [i.label for i in items]

AttributeError: 'NoneType' object has no attribute 'get_patch'
ysbecca commented 6 years ago

What I understood earlier was that a set is the set of image patches from one WSI , so for me total sets would be 10 as I have 10 WSIs.

No, a set is composed of the ith image of each set. If you have 10 WSI and 10 sets, then each set has 1 WSI. But if you have 20 WSI and 10 sets, then each set has 2 WSI. py-wsi was built assuming very large datasets which are usually necessary for research using WSI.

Your error is because you don't have any patches saved in your database.

Suvi-dha commented 6 years ago

but, I am saving patches of normal label as you suggested in your answer above. And when I print the number of items using len(items) I am receiving a significant value ~20000. So, number of items is not empty.

ysbecca commented 6 years ago

Hmm. It's hard for me to tell without seeing all the code but I would suggest debugging from inside the get_set_patches function. Include lots of print statements so you know exactly what's happening. Also, what is np.shape(items)? What datatype is items?

Suvi-dha commented 6 years ago

I have to put checks to eliminate NoneType error at several places, finally got it working. Thank you for your input.