matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.53k stars 11.68k forks source link

Input with gray scale image #609

Open chuntailin opened 6 years ago

chuntailin commented 6 years ago

Hi, I have some images that are gray scale images. These images' shape are (512, 512). But the model need to input the image which is loaded in RGB. So I use numpy.stack to change my image shape from (512, 512) to (512, 512, 3). The training result with this revision isn't good at all, so I'm wondering that what the reason is for the bad training result.

Is it inappropriate to make this revision or the model is not suitable for training gray scale image? Please give me some advise.

BTW, does anyone has the gray scale input version of Mask R-CNN? Please provide me or tell me how to revise the model.

Thanks.

zungam commented 6 years ago

How much data do you have? How long do you train?

Grayscale shouldnt really be a problem I believe. However, check that your numpy.stack actually copies the gray color to all axis (RGB), that it isnt just zeros on the green and blue channel.

chuntailin commented 6 years ago

@zungam Actually, I'm doing medical-image-recognition. I want to detect whether there is tumor in liver from CT. I have 130 patient's CT, so there are about 26000 slices images. I'm sure that the green and blue channel are not just zeros because I copy the values of first channel to the green and blue channel.The three channels' value are the same.

Below are my training image example and labeled image example. [Training image] 33922223_1828078723881337_2291245141224062976_n

[Labeled images] 33944244_1828077540548122_2547428812170199040_n

Could you please give me some advises about how to make the training result better, such as image preprocessing or model revision ?

Thanks.

g2-bernotas commented 6 years ago

I don't have a solution to your problem, but I am using grayscale images where every channel has exactly the same values and I am receiving acceptable results. You can investigate if changing the image mean values has an affect on the result (for me it didn't):

MEAN_PIXEL = np.array([some_value, some_value, some_value])

chuntailin commented 6 years ago

@g2-bernotas Thanks for your reply. Could you please show me your parameters' value of Config? And I also want to ask you how to determine what value of MEAN_PIXEL should I set.

For your first question, I have traced code about the model.py, and I found that there is already code about image augmentation, so I didn't do extra image augmentation.

For your second and third questions, I'm not sure what you actually want to ask. The following are the details about my Config, hope the information will help you more understand what I'm doing and find out the problems I make.

Thanks.

g2-bernotas commented 6 years ago

MEAN_PIXEL is the mean value of all pixels across your train (not sure?) images.

I haven't changed much of the config values, but I played around with different ones to learn via trial and error which works the best.

Yes, augmentation is already there, but it is not going to be used unless you enable it. You can do that wherever you call model.train (should be inside the train function). You will have to change it to something like this:

    print("Training network: heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=10,
                layers='heads', augmentation = imgaug.augmenters.OneOf([
                    imgaug.augmenters.Fliplr(0.5),
                    imgaug.augmenters.Flipud(0.5),
                    imgaug.augmenters.Affine(rotate=(-90, 90))
                ]))

To find more info about the augmentation package, look here.

Finally, regarding the graphs. TensorBoard allows you to study the training process. It should look similar to this: image

To view it, open a terminal and type: tensorboard --logdir=path/to/log-directory It should provide you a local address that you can paste inside your browser. More info here.

chuntailin commented 6 years ago

@g2-bernotas Thanks for your reply. I will take your advice and revise the setting of the model parameters. Can I ask you one more question? Did you revise any code of the model in order to fit your training target, or you just change the dataset and start training?

zungam commented 6 years ago

@chuntailin, it could be that you have too few tumors in your dataset. How many tumors are there in total on each person, and how many images do contain tumors? I would recomend increasing your dataset by using augmentation (see this link for how augmentation are made). They can easiliy be sent into your train() function in maskrcnn.

For cancer images, I think the relevant ones are: -A small amount of PieceWise Affine -A small translate -A small scale -A small rotate -A small perspective transform -A small crop These are only transformation operations, because I think contrast, sharpness and color is similar on all CT scans.

Also, I have a theory that feeding the slicenumber to the final activation layers in maskrcnn_mask and maskrcnn_class, may boost your performance. Why? Because cancer is more frequent in eternal organs than say your "foot". This is usefull information which might help the classifier to make relevant guesses. But its just a theory.

chuntailin commented 6 years ago

@zungam My training dataset contains 130 patients' CT. There are about 20 images containing liver and 6 images containing tumor for each patient. I will try using normalization and augmentation for my new training process.

I don't really understand the theory you talked about. Could you please offer more detail information or examples about how to do that?

Thanks!

g2-bernotas commented 6 years ago

26000 slices / 130 individuals = 200 slices/per person.

As far as I understand, these slices are from the top of a person (head) to the bottom (feet) divided into 200 slices. You are particularly interested in tumor detection that may appear around the middle of the body - let's assume slice 80. What zungam was getting to is that this specific tumor doesn't really exist in the other slices, so to simplify the problem either reduce the number of slices and keep the ones of importance or somehow store the slice number in the algorithm as it is a very important indicator (e.g. tumor appears in slices 60-100).

Am I right that you have scanned 130 people (with tumors) where each has 6 images of tumor, so 130 6 = 780? Are you also interested in detecting the liver? Liver samples: 130 20 = 2600. In that case you have object detection with 3 classes - tumor, liver and background.

Is your data labelled correctly and are you reading it correctly? How do you split your data to train and valid? I would suggested to include individual (26 is 20%) complete scans to the validation dataset, so you make sure that it has not seen during the training.

Let's see if the augmentations improve your work. Also, check the loss graphs as they are good indicators of how well the model is doing.

zungam commented 6 years ago

Okey, I see. With some augmentation it should be enough I think.

It was just an idea. But the questions is if you are feeding images of feet, head etc? If so, to force the mask_rcnn to learn the idea of liver, you could tell the machine what slice number you are on (1 being top of head, 1263 being heart, 2961 being feet etc etc) by feeding the number into an activation layer on the last layer in all part of the network. But if you only have images of liver, then its no point.

chuntailin commented 6 years ago

@zungam @g2-bernotas Thanks for your help. I will take your advise and start the new training process.

But I have a question when I modified the project. As I mentioned earlier, there are about 75 slices abdominal cavity CT for each patient, but only 20 images containing liver and 5 images containing tumor. The training images and labeled images are paired, which means that "training_image_0" would be paired with "liver_image_0" and "tumor_image_0". Here come the question, for the other 15 images that not containing tumor, could I return the numpy array which value are all False in "load_mask" function ?

Below is the code I programed originally.

# Read mask files from .png image
        mask = []

        liver_img = skimage.io.imread(liver_path, as_grey=True).astype(np.bool)
        tumor_img = skimage.io.imread(tumor_path, as_grey=True).astype(np.bool)

        mask.append(liver_img)
        mask.append(tumor_img)

        mask = np.stack(mask, axis=-1)

        class_ids = np.array([1, 2], dtype=np.int32)

        return mask, class_ids

The method I programed is about no matter whether the image containing tumor, it always returns two class_ids and masks. If the image doesn't contain tumor, the value of the tumor_img would be all False and then appended to the mask. I'm wondering whether this is a good way, or it would be one of the reason for getting bad result? Does it have a better way to solve this?

zungam commented 6 years ago

I dont kwow it this repo allows for training on emty images, but if it does, its not a bad thing. Training on negative examples is also important. However there should be a good ratio. It should not train 95% on negative examples. I think the config value:

# Percent of positive ROIs used to train classifier/mask heads
    ROI_POSITIVE_RATIO = 0.33

Decides this!

zungam commented 6 years ago

Btw, since you are on black and white images, another idea could be feeding in three images into the RGB channel. For example slice 0 into R, slice 1 into G and slice 2 into B. In this case, it would also judge the output by the immediate adjacent slices, for more local information. However, be here you must use only the masks from slice 1 (the middle) as ground truth when training. The surrounding images will just be "helping images".

chuntailin commented 6 years ago

@zungam Thanks for offering me a new idea. If I do the case you mentioned, you mean that the three channels of the mask all need to be the middle slice?

BTW, things about the CT slices volumes I said earlier is wrong. There are around 200 slices image containing liver. Below is the screenshot of the labeled images for one of the patients. 34266618_1830499043639305_2669652775276642304_n

With hundreds of labeled images for each patient, do you think that it's appropriate to train including the images containing a few liver (such as 307.png ~ 324.png)? Or just training the images containing more liver (such as 343.png ~ 468.png) with augmentation is enough ?

zungam commented 6 years ago

I think it is best to train on all!

If I do the case you mentioned, you mean that the three channels of the mask all need to be the middle slice?

No, I mean that the 3 input channles can be 3 different slides, but all 3 slides must be really close to each other (so that the input will become a 3D section). The output should be the mask of only the middle slide. Why? Because you cant have three outputs in maskrcnn

chuntailin commented 6 years ago

@zungam Got it! I will try it next time. Thanks for your reply! Sorry, I forget that mask should be only one channel boolean array, so it won't have the 3 channels problem.

nooriahmed commented 5 years ago

Greetings! I am seeking help, image: new_02 05 2019 my predicted image's masks are exceeding as compare to the ground truth. not accurately segmenting with respect to the ground truths. I have also made some changes in the config.py according to my dataset images. but it doesn't work. I would be very grateful for the kind consideration and help. what settings I should use?

prafulag commented 5 years ago

I am also trying to use this code for a medical dataset. I have set the mean_pixel to [0,0,0] for now, since I do not want any change in my intensity to begin with. The main issue I am facing is that the 'loss' and 'rpn_bbox_loss' goes to nan after few steps in first epoch, I have reduced the learning rate to 1e-5 and still same. Moreover, 'mrcnn_bbox_loss' and 'mrcnn_mask_loss' always show as 0.0000+00 I am trying to train the model from scratch and have set 'layers' to 'all' to enable training all layers. Any suggestions/thoughts ?

HAMZARaouia commented 5 years ago

I am using grey scaled images, and I think leaviing the nbr of channels parameter to 3 doesn't make a problem. As I understood, It just repeats the data 3 times which would underutilize your GPU memory, but in terms of detection it should do the job.

HAMZARaouia commented 5 years ago

the issue #140 may help you.

prafulag commented 5 years ago

Thanks @HAMZARaouia I also found the required info in the wiki page https://github.com/matterport/Mask_RCNN/wiki

ApoorvaSuresh commented 4 years ago

Hi, does anyone have an idea as to how the pretrained weights are used for the 4th channel? are they even used at all?

ApoorvaSuresh commented 4 years ago

@g2-bernotas do you use pre-trained weights when you trained on grey-scale images? can the pre-trained weights be used?

Amrimn commented 4 years ago

@chuntailin i'am using also gray laver images for training, can you please share with me all the modification you did? thanks in advance

sohinimallick commented 3 years ago

Greetings! I am seeking help, image: new_02 05 2019 my predicted image's masks are exceeding as compare to the ground truth. not accurately segmenting with respect to the ground truths. I have also made some changes in the config.py according to my dataset images. but it doesn't work. I would be very grateful for the kind consideration and help. what settings I should use?

@nooriahmed did you find a solution for this? I se to be having a similar problem.

Dartum08 commented 3 years ago

Hey Everyone , I am actually also trying to run one experiment with grayscale input and for that I have already made required changes. The problem I am getting is that the code is able to run training for head epochs but it stopped for all epochs training like after printing the layers' names there's no output related to anything, there's no error and gpu memory getting freed which means experiment is getting stopped when all layers training starts. I am not able to understand what is causing that? Did this happen to anyone else also? Let me know if I need to share anything to make my doubt more clear. Thanks!

taroko-mooncake commented 2 years ago

Greetings! I am seeking help, image: new_02 05 2019 my predicted image's masks are exceeding as compare to the ground truth. not accurately segmenting with respect to the ground truths. I have also made some changes in the config.py according to my dataset images. but it doesn't work. I would be very grateful for the kind consideration and help. what settings I should use?

@nooriahmed did you find a solution for this? I se to be having a similar problem.

Did you find a solution ?