seba-1511 / cervix.kaggle

Intel Cervix Kaggle Competition
Apache License 2.0
0 stars 0 forks source link

Explore downsampling strategies #4

Open seba-1511 opened 7 years ago

seba-1511 commented 7 years ago

One thing we can try is to downsample images from start, and see how well humans can perform on those small size images. (eg, if we only need 1024x1024px images, then we save loading time as opposed to 4096x4096px)

So the task is to try different image sizes, and tell us what you think is the best ration of size reduction vs blurriness of the image.

While finding the optimal image size, it would also be nifty to have a script that takes a folder of images and creates a copy of all images of this folder but in a different location. See this: https://stackoverflow.com/questions/273946/how-do-i-resize-an-image-using-pil-and-maintain-its-aspect-ratio

seba-1511 commented 7 years ago

PS: The same should be done for transforming images to black and white. If we do this, can we still properly see the cervix ? And what kind of black and white transform should we use ?

See this: https://stackoverflow.com/questions/9506841/using-python-pil-to-turn-a-rgb-image-into-a-pure-black-and-white-image

yiqiushen commented 7 years ago

I am just thinking... since centers from (not green) images are mostly purple->red->pink, could we create a formula for monochrome images such that the contrast would be more obvious (for the model)?

seba-1511 commented 7 years ago

Yes, that might be a good idea. Do you want to take care of it ?

One thing that might help generalize to green/other images is to convert to black and white.

seba-1511 commented 7 years ago

Update from @rachaelcardoso :

For the project, I tried two black and white transforms - one which makes the image binarized and the other one monochrome. I've attached the code I wrote but we should definitely go for monochrome, as the clarity is compromised in the binarized form.

Also, I resized the images to 1024 and found no significant difference in the clarity. When I discussed this with Jim on Monday, it seemed that he tried 512 which was equally easy to identify and so maybe we can resize all the images to 512.

files: resize.py

import os
import PIL
from PIL import Image

START_POINT = 1

NUM_FOLDER = 2 
curdir = os.path.abspath(os.path.curdir)
FOLDER = os.path.join(curdir, 'folder' + str(NUM_FOLDER))

im_names = [f for f in os.listdir(FOLDER)]

for name in im_names[START_POINT:]:
    im_path = os.path.join(FOLDER, name)
    img = Image.open(im_path)
    basewidth = 1024
    wpercent = (basewidth/float(img.size[0]))
    hsize = int((float(img.size[1])*float(wpercent)))
    img = img.resize((basewidth,hsize), PIL.Image.BICUBIC)
    img.save(name) 
    START_POINT = START_POINT + 1

bw.py

import os
import PIL
from PIL import Image

START_POINT = 1

NUM_FOLDER = 2 
curdir = os.path.abspath(os.path.curdir)
FOLDER = os.path.join(curdir, 'folder' + str(NUM_FOLDER))

im_names = [f for f in os.listdir(FOLDER)]

for name in im_names[START_POINT:]:
    im_path = os.path.join(FOLDER, name)
    img = Image.open(im_path)
    gray = img.convert('L')
    bw = gray.point(lambda x: 0 if x<128 else 255, '1')
    bw.save(name) 
    START_POINT = START_POINT + 1
yiqiushen commented 7 years ago

I did the resize in MATLAB. 256256 is not working so well because some of the images are too zoomed in, but 512512 works fine.

names = csvread('labels1.csv');

numImages = length(names);

currentFolder = strcat(pwd,'\folder1\');

for i = 1:numImages

    if names(i,4) ~= 1

        x = names(i,2);

        y = names(i,3);

        rawImage = imread(strcat(currentFolder,num2str(names(i,1)),'.jpg'));

        newImage = imcrop(rawImage,[x - 256, y - 256, 511, 511]);

        newImage = imresize(newImage, 0.5);

        imwrite(newImage,strcat(currentFolder,'croped2\',num2str(names(i,1)),'croped.jpg'));

    end

end