nyukat / breast_cancer_classifier

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
https://ieeexplore.ieee.org/document/8861376
GNU Affero General Public License v3.0
834 stars 263 forks source link

some images doesn't work in crop_single_mammogram.py #42

Closed nightandweather closed 3 years ago

nightandweather commented 3 years ago

thank you for your work sharing, I'm trying to adapt your repository to our dataset.

`score_heat_list = [] import glob def make_dir(name): if not os.path.isdir(name): os.makedirs(name) print(name, "폴더가 생성되었습니다.") else: print("해당 폴더가 이미 존재합니다.")

make_dir('save_imageheatmap_model_figure_folder')

def json_extract_feature(json_data): patient=json_data['case_id']

#read_all_data:
"""
components 
'user id' = no
'case_id' = split.('_')[1] = patients number
'contour_list' = dict('image_type',dict())

"""
temp_image_type = []
temp_image_type1 = []
temp_image_type2 = []
temp_image_type3 = []

temp_key = []

temp_contour = []
temp_contour1 = []
temp_contour2 = []
temp_contour3 = []

for image_type in json_data['contour_list']['cancer']:
    # print(image_type)
    if image_type == 'lcc': 
        temp_image_type.append(image_type)
    if image_type == 'lmlo':
        temp_image_type1.append(image_type)
    if image_type == 'rcc':
        temp_image_type2.append(image_type)
    if image_type == 'rmlo':
        temp_image_type3.append(image_type)

    for key in json_data['contour_list']['cancer'][image_type]:
        # print(key)

        for contour in json_data['contour_list']['cancer'][image_type][key]:

            # print(contour)
            # print(contour.get('x'))
            # print(contour.get('y'))
            bin_list = [contour.get('y'),contour.get('x')]
            if image_type == 'lcc':
                temp_contour.append(bin_list)
            if image_type == 'lmlo':
                temp_contour1.append(bin_list)
            if image_type == 'rcc':
                temp_contour2.append(bin_list)
            elif image_type == 'rmlo':
                temp_contour3.append(bin_list)

return temp_image_type,temp_image_type1,temp_image_type2,temp_image_type3,temp_contour,temp_contour1,temp_contour2,temp_contour3

from skimage import draw def polygon2mask(image_shape, polygon): """Compute a mask from polygon. Parameters

image_shape : tuple of size 2.
    The shape of the mask.
polygon : array_like.
    The polygon coordinates of shape (N, 2) where N is
    the number of points.
Returns
-------
mask : 2-D ndarray of type 'bool'.
    The mask that corresponds to the input polygon.
Notes
-----
This function does not do any border checking, so that all
the vertices need to be within the given shape.
Examples
--------
>>> image_shape = (128, 128)
>>> polygon = np.array([[60, 100], [100, 40], [40, 40]])
>>> mask = polygon2mask(image_shape, polygon)
>>> mask.shape
(128, 128)
"""
polygon = np.asarray(polygon)
vertex_row_coords, vertex_col_coords = polygon.T
fill_row_coords, fill_col_coords = draw.polygon(
    vertex_row_coords, vertex_col_coords, image_shape)
mask = np.zeros(image_shape, dtype=np.bool)
mask[fill_row_coords, fill_col_coords] = True
return mask

############################################################################################################################## from tqdm import tqdm from src.heatmaps.run_producer_single import produce_heatmaps import json from PIL import Image annotation_folder = r'/home/ncc/Desktop/2020_deep_learning_breastcancer/annotation_SN/' import pickle for png in tqdm(png_list[0:8]): print(PATH+png) crop_single_mammogram(PATH+png, horizontalflip = 'NO', view = png.split('')[1].split('.')[0], cropped_mammogram_path = PATH+'cropped_image/'+png, metadata_path = PATH+png.split('.')[0]+'.pkl',num_iterations = 100, buffer_size = 50) print(PATH+'cropped_image/'+png) get_optimal_center_single(PATH+'cropped_image/'+png,PATH+png.split('.')[0]+'.pkl') model_input = load_inputs( image_path=PATH+'cropped_image/'+png, metadata_path=PATH+png.split('.')[0]+'.pkl', use_heatmaps=False, )
####################################################################################################################################
parameters = dict( device_type='gpu', gpu_number='0',

patch_size=256,

stride_fixed=20,
more_patches=5,
minibatch_size=10,
seed=np.random.RandomState(shared_parameters["seed"]),

initial_parameters="/home/ncc/Desktop/breastcancer/nccpatient/breast_cancer_classifier/models/sample_patch_model.p",
input_channels=3,
number_of_classes=4,

cropped_mammogram_path=PATH+'cropped_image/'+png,
metadata_path=PATH+png.split('.')[0]+'.pkl',
heatmap_path_malignant=PATH+png.split('.')[0]+'_malignant_heatmap.hdf5',
heatmap_path_benign=PATH+png.split('.')[0]+'_benign_heatmap.hdf5',

heatmap_type=[0, 1],  # 0: malignant 1: benign 0: nothing

use_hdf5="store_true"

) ###########################################################################################################################

read annotation SN00000016_L-CC.png

코드를 읽어보면 이름이 같은 JSON 파일을 4번 읽어오고 있음.. 코드 경량화때 해결 필요

annotation 기준은 CROP된 이미지가 아니라, 원본 이미지임, 그런데 이미지로 보여주는건 CROP된 이미지로 보여주고 있음..

# print(png.split('_')[0])
with open(PATH+png.split('.')[0]+'.pkl','rb') as f:
    location_data = pickle.load(f)
print(location_data)
start_point1 = list(location_data['window_location'])[0]
endpoint1 = list(location_data['window_location'])[1]
start_point2 = list(location_data['window_location'])[2]
endpoint2 = list(location_data['window_location'])[3]
print(start_point1,start_point2)
with open(annotation_folder+'Cancer_'+png.split('_')[0]+'.json') as json_file:
    json_data = json.load(json_file)

temp_image_type,temp_image_type1,temp_image_type2,temp_image_type3,temp_contour,temp_contour1,temp_contour2,temp_contour3 = json_extract_feature(json_data)

import operator
if png.split('_')[1].split('.')[0] =='L-CC':
    new_contour_list = temp_contour
if png.split('_')[1].split('.')[0] =='L-MLO':
    new_contour_list = temp_contour1
if png.split('_')[1].split('.')[0] =='R-CC':
    new_contour_list = temp_contour2  
if png.split('_')[1].split('.')[0] =='R-MLO':
    new_contour_list = temp_contour3

im = Image.open(PATH+png)
im_cropped = Image.open(PATH+'cropped_image/'+png)
print('원본 이미지:',im.size,'cropped image:',im_cropped.size)
new_contour = []
for image_list in new_contour_list:
    # print('_',image_list)
    new_temp_contour =map(operator.add,image_list,reversed(list(np.array(im.size)/2)))
    new_contour.append(list(new_temp_contour))
    # print(new_contour)
try:
    # 'window_location': (103, 2294, 0, 1041)
    img = polygon2mask(im.size[::-1],np.array(list(new_contour)))
    img_cropped = img[start_point1:endpoint1,start_point2:endpoint2]
    im = cv2.imread(PATH+png)
    im_cropped = cv2.imread(PATH+'cropped_image/'+png)
except ValueError as e:
    img = np.zeros(im.size)

########################################################################################################################### random_number_generator = np.random.RandomState(shared_parameters["seed"])

# random_number_generator = np.random.RandomState(shared_parameters["seed"])
produce_heatmaps(parameters)
image_heatmaps_parameters = shared_parameters.copy()
image_heatmaps_parameters["view"] = png.split('_')[1].split('.')[0]
image_heatmaps_parameters["use_heatmaps"] = True
image_heatmaps_parameters["model_path"] = "/home/ncc/Desktop/breastcancer/nccpatient/breast_cancer_classifier/models/ImageHeatmaps__ModeImage_weights.p"

model, device = load_model(image_heatmaps_parameters)

model_input = load_inputs(
image_path=PATH+'cropped_image/'+png,
metadata_path=PATH+png.split('.')[0]+'.pkl',
use_heatmaps=True,
benign_heatmap_path=PATH+png.split('.')[0]+'_malignant_heatmap.hdf5',
malignant_heatmap_path=PATH+png.split('.')[0]+'_benign_heatmap.hdf5')

batch = [
process_augment_inputs(
    model_input=model_input,
    random_number_generator=random_number_generator,
    parameters=image_heatmaps_parameters,
    ),
]

tensor_batch = batch_to_tensor(batch, device)
y_hat = model(tensor_batch)
###############################################################
fig, axes = plt.subplots(1, 5, figsize=(16, 4))
x = tensor_batch[0].cpu().numpy()
axes[0].imshow(im, cmap="gray")
axes[0].imshow(img, cmap = 'autumn', alpha = 0.4)
axes[0].set_title("OG_Image")

axes[1].imshow(im_cropped, cmap="gray")
axes[1].imshow(img_cropped, cmap = 'autumn', alpha = 0.4)
axes[1].set_title("Image")

axes[2].imshow(x[0], cmap="gray")
axes[2].imshow(img_cropped, cmap = 'autumn', alpha = 0.4)
axes[2].set_title("Image")

axes[3].imshow(x[1], cmap=LinearSegmentedColormap.from_list("benign", [(0, 0, 0), (0, 1, 0)]))
axes[3].set_title("Benign Heatmap")

axes[4].imshow(x[2], cmap=LinearSegmentedColormap.from_list("malignant", [(0, 0, 0), (1, 0, 0)]))
axes[4].set_title("Malignant Heatmap")
plt.savefig('save_imageheatmap_model_figure_folder'+'/'+png.split('.')[0]+'.png')
################################################################
predictions = np.exp(y_hat.cpu().detach().numpy())[:, :2, 1]
predictions_dict = {
    "image" : png,
    "benign": float(predictions[0][0]),
    "malignant": float(predictions[0][1]),
}

print(predictions_dict)
score_heat_list.append(predictions_dict)`

Screenshot from 2020-11-25 11-38-47

Attached file is cropped mammography which is made by this code. Issue is some mammogram doesn't crop well. Am I doing something wrong?

jpatrickpark commented 3 years ago

Hi @nightandweather,

It seems that the issue you are experiencing is that crop_single_mammogram function from src/cropping/crop_single.py does not remove background for some of your images. Is this correct?

Our cropping algorithm has some strict assumptions about the data. If your dataset does not satisfy any of these assumptions, the cropping algorithm will not work well:

For more information, please refer to Algorithm 1 in our data report.

I would recommend trying any one of the following to address the issue:

nightandweather commented 3 years ago

Thank you for your comment! Just like you said, image threshold method works!

Actually.. there are more issues to adapt this github code in our dataset

Attached ROC curve is result of following your Github Repository.

Screenshot from 2020-11-26 16-46-54

Use 10 malignant annotated patients(single breast diagnosed malignant,left, right) * 4 mammography standard view to get ROC curve, and annotation label list is

[1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0]

ex)If the left breast is malignant, it is assumed that it is malignant in the l-mlo,l-cc view.

and model got low score

test_image_model.zip

Maybe I made a mistake in preprocessing, As stated in the paper, the pixel array of dcm was saved as png as uint 16 data using the standardized code provided in github. this is dicom to 16 bit png code

`# 아무래도 16비트 변환이 안된 모양이다. 다시 만들어보자 """

cv2.imread default가 8bit로 읽어들인다.

""" ######################################################################### import natsort import imageio import numpy as np def standard_normalize_single_image(image): """ Standardizes an image in-place """

image =image - np.mean(image) # np.mean을 통과하면 float64로 변하는 모양인데..
image /= np.maximum(np.std(image), 10**(-5))
return image

def load_dcm_data(path): shpfiles = [] labels = []

annotation_files = pd.read_excel(csv)

for dirpath, subdirs, files in os.walk(path):
    for x in files:
        if x.endswith(".dcm"):
            shpfiles.append(os.path.join(dirpath, x))

return natsort.natsorted(shpfiles)

다시 dicom을 png로 변환한다.

time consuming 작업이므로 몇개만 골라서 빼보자

####################################################################################### import pydicom import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler base_malignant_folder = r'/home/ncc/Desktop/2020_deep_learning_breastcancer/submit_breast_SN/malignant' test_folder_path = r'/home/ncc/Desktop/2020_deep_learning_breastcancer/test_malignant_folder/' sample_malignant_list = load_dcm_data(base_malignant_folder)[0:40] sample_malignant_list threshold = 100 for f in sample_malignant_list:
file_name = os.path.basename(f)

ds = pydicom.dcmread(f) # read dicom image
img = ds.pixel_array # get image array (0~ 4095)
img[img<150]=0

plt.hist(img.ravel()); 
plt.show()

#논문에서 원하는건 16bit의 standardized된 mammmography png임
img = standard_normalize_single_image(img)
img2 = (65535*(img - img.min())/img.ptp()).astype(np.uint16) # (0~65535)

# img_threshold = img_threshold.reshape(img2.shape)
# Get the information of the incoming image type

print('scaler적용',img2.max(),img2.min())
imageio.imwrite(test_folder_path + file_name.split('.')[0]+'.png',img2.astype(np.uint16)) # write png image`

And while I was looking at the last issues, https://github.com/nyukat/breast_cancer_classifier/issues/9#issuecomment-489370370

Do I have to change the annotated label according to the results of the model in order to raise the AUC core?

and, I'd like to know the value of the parameter to get the 3ch(gray, benign, malignant) image in the paper(image-heatmap model parameter). I made various parameter values through dictionaries and carried out grid searching, but no satisfactory figure was found.

Thank you for accepting my question!

jpatrickpark commented 3 years ago

It seems that you are trying to load the images and feed it to the model on your own, instead of using our pipeline. I noticed some misunderstandings and problematic lines of code in your custom pipeline.

As @kjgeras mentioned in https://github.com/nyukat/breast_cancer_classifier/issues/9#issuecomment-490127933, even the slightest differences in the loading pipeline can lead to random predictions.

Even if all the preprocessing is done correctly, if your dataset itself is drastically different from our own, it could also affect the performances as mentioned in #19.

I suggest you look at the debugging strategy written by @kjgeras which can be found in https://github.com/nyukat/breast_cancer_classifier/issues/9#issuecomment-490146418

At this stage, I recommend that you clone our repository again without any modification, fix the cropping algorithm to use nonzero masking threshold, and use the provided pipeline as-is (run.sh or run_single.sh). For the images, you can just save dicom pixel_array as 16-bit png files without any standardization or normalization. This way, you can be more sure that you are preprocessing the images the way we expect. If you still do not get a reasonable performance, you can try examining the dicom metadata to see if you applied the same filtering criteria as we did.

jpatrickpark commented 3 years ago

And while I was looking at the last issues, #9 (comment)

Do I have to change the annotated label according to the results of the model in order to raise the AUC core?

No, you must not change the label according to the results of the model when calculating AUC. What the comment was discussing is how to set the decision threshold, which has nothing to do with AUC calculation. On the other hand, AUC is the metric that measures the model's ability to distinguish between malignant and not malignant cases at all threshold values. To learn more about AUC, I recommend reading this article about receiver operating characteristic curve.

jpatrickpark commented 3 years ago

and, I'd like to know the value of the parameter to get the 3ch(gray, benign, malignant) image in the paper(image-heatmap model parameter). I made various parameter values through dictionaries and carried out grid searching, but no satisfactory figure was found.

I am not sure what you mean by this sentence. Do you mean you changed heatmap parameters such as stride_fixed and patch_size from src/heatmaps/run_producer.py or src/heatmaps/run_producer_single.py? You should not have changed these parameters, as the classifier model expects heatmaps generated with predefined parameters we provided.

The reason why you are not be seeing satisfactory heatmaps might also be related to the differences in datasets and issues in your custom pipeline I explained in https://github.com/nyukat/breast_cancer_classifier/issues/42#issuecomment-734422874 . If you are not feeding the images the way we do, you cannot expect reasonable heatmaps.