qubvel / segmentation_models

Segmentation models with pretrained backbones. Keras and TensorFlow Keras.
MIT License
4.74k stars 1.03k forks source link

model not learning anything #233

Open ghost opened 4 years ago

ghost commented 4 years ago

been training images on my model with masks for earrings and not earrings.

dice_loss = sm.losses.DiceLoss()
focal_loss = sm.losses.BinaryFocalLoss() 
total_loss = dice_loss + (1 * focal_loss)

opt = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999)

metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

model = sm.PSPNet(BACKBONE, encoder_weights = 'imagenet', classes = 1,
encoder_freeze=True, activation='sigmoid', downsample_factor=16, input_shape=(960,960,3),
psp_conv_filters=1024, psp_pooling_type='avg')

backbone is inceptionresnetv2 and i've set masks and images to 0...1 range.

the model trains but doesn't learn anything as its output is a matrix full of 0's.

it generated a mask when the input size was 380,380 but does not in this case.

could you point out any error i might be making? thanks in advance

PS: i could attach the colab file that i am using if you want to refer to something else.

qubvel commented 4 years ago

Try to learn something with freeze_encoder=False

ghost commented 4 years ago

in that case it does not make good predictions

update: i tried unfreezing it and its just predicting 0's now and as the image has less 1's its getting higher iou score too

JordanMakesMaps commented 4 years ago

i've set masks and images to 0...1 range.

Masks have values of either 0 or 1, and images have floating point values between 0 and 1?

ghost commented 4 years ago

yes @JordanMakesMaps attaching the code.txt and there are segments which are not currently used in the model or i use them to test at times, they have been labeled as test or unused.

code.txt

please point me at the potential mistakes i might have been making

JordanMakesMaps commented 4 years ago

Okay @jayayy, I have some input that may or may not be useful but I think it would be worth trying out.

Starting at line 60:

for i in range(l): 
  imgg = cv2.imread(path + '/'+lis[i])
  gray = cv2.cvtColor(imgg, cv2.COLOR_BGR2GRAY)
  gray = cv2.resize(gray,(960,960))
  mask[i] = np.expand_dims(gray,axis=2)
  #mask[i] = gray.reshape(960,960,1)
  if i%50==0:
      print(i)

Here you're taking in a mask that is currently in RGB format and converting it to Grayscale, then resizing and expanding so that the shape is correct for what you're trying to do. I think the issue is that the mask needs to be in a binary format, not Grayscale. Read this for info on that (specifically, read "Representing the Task").

Once your masks are in a binary format, you have to one-hot-encode them. You can do this with your own script or use keras.utils.to_categorical().

As a sanity check, the final shape of your masks that you pass into training should be (batch_size, height, width, 2), the 2 comes from the fact that you're doing a binary segmentation. Even though you only have one class that you're looking at, you need to make it clear in your training data that there are really 2 classes, the one that you're interested in, and the background class.

The one-hot-encoded mask's last dimension when indexed should be (960, 960), and for the 0th index should have zeros where the class of interest is, and ones where the background is. The 1st index should be the opposite, where there are zeros where the background is, and ones where the class of interest is (see image below).

one-hot-endcoded-example

I would also recommend you to use the preprocess_input() function provided by @qubvel. Each preprocess_input() is different but will preprocess in the input image in such a way to mimic how the original backbone was trained. So changing gears and looking at the code starting at line 119:

# data_gen_args = dict(featurewise_center=True,          # consider removing
#                      featurewise_std_normalization=True,     # consider removing
#                      rotation_range=90,
#                      width_shift_range=0.1,
#                      height_shift_range=0.1,
#                      zoom_range=0.2)
# image_datagen = ImageDataGenerator(**data_gen_args)
# mask_datagen = ImageDataGenerator(**data_gen_args)  

You have featurewise_center and featurewise_std_normalization which alter the pixel values (other alternatives are rescale and zca_whitening) in an attempt to make is easier for the model to learn. The thing is, those two need to be fit on a the dataset (or a representative sample of the dataset) to be useful. Because you do not fit them, right now I don't think that those lines of code are doing anything for you. If you choose not to remove those to lines of code and instead find a way to make them work properly (see this) DEFINITELY create a different data_gen_args for your mask_datagen because featurewise_center and featurewise_std_normalization would mess up your masks values resulting in bad predictions.

The very last thing is the predictions:

path = 'C:\\Users\\Deepak\\Desktop\\ready\\test\\img'
lis = os.listdir(path)
imager = cv2.resize(cv2.imread(path +'/'+ lis[9]),(960,960))
imager = imager.reshape(1,960,960,3)
raw = model.predict(x=imager)
raw = raw.reshape(960,960)

You can change that, into:

path = 'C:\\Users\\Deepak\\Desktop\\ready\\test\\img'
lis = os.listdir(path)
imager = cv2.resize(cv2.imread(path +'/'+ lis[9]),(960,960))
imager = imager.reshape(1,960,960,3)
raw = model.predict(x=imager).squeeze() # Changing the shape back to (height, width, classes)
cv2.imshow(np.argmax(raw, axis = 0)) # We're finding the max index across the 0th dimension

Hope this helps 👍

Just realized you commented out all of the ImageDataGenerator stuff 😂, either way still useful to know.

ghost commented 4 years ago

so much of appreciation. also due to our frequent communication you seem to be a pen friend to me. ;-)

the masks are in 0 or 1, is that okay? also acc to binary classification example by @qubvel, i am pretty sure that i need only one mask showing the object of interest. i also previously raised an issue (dataset type for binary segmentation #221 ) for the same. nevertheless i still will try one-hot-encoding if that works.

also i was planning to use augmentation and will refer to your text then so your efforts will be pretty much useful then.

ill try the prediction part too after changing the maps.

thanks a lot

JordanMakesMaps commented 4 years ago

@jayayy you're right, I gave you conflicting responses 🤦‍♂️ Sorry about that.

qubvel commented 4 years ago

Images should be in range 0..255

ghost commented 4 years ago

@qubvel your preprocess function makes images' values negative when i use inceptionresnetv2 backbone.

@JordanMakesMaps one hot encoder is no where to be used right here? and despite all, the problem is not solved yet. what am i doing wrong?

JordanMakesMaps commented 4 years ago

Looking at your code, this is where I would do it. But you would need to find a way to convert your RGB masks to some binary format.


path = 'C:\\Users\\Deepak\\Desktop\\ready\\train\\mask'
lis = ns.natsorted(subset_listt)
l = len(lis)
mask = np.zeros((l, 960,960,2)).astype('float')
print(path)

for i in range(l): 
  imgg = cv2.imread(path + '/'+lis[i])
  gray = cv2.cvtColor(imgg, cv2.COLOR_BGR2GRAY) # shape = (H, W, 1)
  gray = cv2.resize(gray,(960,960)) # shape = (960, 960, 1)

  # find a way to convert masks to binary form here
  # you might need to use dictionaries to convert the RGB/Grayscale mask to binary mask
  # e.g. background == 0, class of interest == 1
  # then one-hot-encode them

  one_hot_encoded = keras.utils.to_categorical(binary_mask, nb_classes = 2) # shape = (960, 960, 2)
  mask[i] = np.expand_dims(one_got_encoded, ,axis=2) # shape = (1, 960, 960, 2)
  if i%50==0:
      print(i)

See the image below, it's taken from the cityscape dataset. The mask is in RGB format, so there are three channels. To one-hot-encode the masks, they first need to be converted to binary. For example, the car class here is dark-blue, which corresponds to (0, 0, 142) RGB; you could use a dictionary to convert all pixel values with this tuple to a single value for your binary mask.

car_scene


color_converter = {(0, 0, 0) : 0, # background, ignore class
                              (0, 0, 141) : 1, # Cars
                              (128, 63, 127) : 2, # Road
                              (243, 36, 231) : 3} # Sidewalk
ghost commented 4 years ago

thanks for the reply.

Screenshot (30)

i had previous png files with 3 channels and 255 for interested class and 0 for background. on converting them to grayscale, they became one channel 0 or 255 values (there's no other value except 0 or 255). now dividing these by 255 will give me binary values. so that part is clear.

i am a tiny bit afraid about the ratio of the 0's and 1's. for reference, check issue #221. i have uploaded the dataset examples

you can close it if there is nothing more that i might be missing on. thanks for your further in depth inputs.