keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.01k stars 19.48k forks source link

Pixelwise labels in Keras #1169

Closed KlaymenGC closed 8 years ago

KlaymenGC commented 8 years ago

Hello,

I'm building an end-to-end network to produce a pixelwise probability map of the input images. The input images (500x300x3) have pixelwise labels (500x300) indicating which class each pixel is belonging to. The data and label are defined as follows:

    img_height = 300
    img_width  = 500
    data = np.empty((num_images, 3, img_height, img_width), dtype="float32")
    label = np.empty((num_images, img_height, img_width), dtype="uint8")

I want to use cross entropy as the loss, could anyone suggest a way of implement this using Keras? Thanks in advance!

elanmart commented 8 years ago

Can't You pass the labels as a flattened vector and then use binary_crossentropy loss?

KlaymenGC commented 8 years ago

@elanmart thanks for the reply, could you please be more precise? The desire output will be 500x300x21 (21 classes to classify), in each channel of the output, an activation map will be generated showing the probability of each pixel belonging to this class.

elanmart commented 8 years ago

Hey, sorry for the late reply.

I was actually wrong. Here's the correct way: Keras' softmax can be applied to 3D tensors, where the softmax is computed for the last dimension.

Your predictions will be batch_size x 500 x 300 x 21. Flatten them into batch_size x 1500 x 21 and apply softmax. Now all you have to do is supply flattened labels with shape batch_size x 1500, where each element is a scalar indicating desired labels.

At least that's what I'd do, ping @EderSantana?

KlaymenGC commented 8 years ago

@elanmart Again, thank you so much for your reply! I was thinking the same about getting the probability map using the softmax as you described. It may be dumb but the problem for me now is that I haven't figured out how to train the network using Keras,

model.compile(loss='binary_crossentropy', optimizer='sgd')

this won't work... could you please shed some light on this one? Thanks in advance!

EderSantana commented 8 years ago

Here is one thing about Keras, it always assume that your dimensions are (samples, dim) or (samples, time, dim). In the second case, when calculating the cost you vectors will be reshaped to (samples*time, dim). If you have something like (samples, time, dim1, dim2) they will be reshaped to (samples*time*dim1, dim2) which will average your cost function in a wrong way. So, whenever you can, always reshape your output data to something like (samples, time, dim1*dim2*...).

I'll open an issue about that and work on it later.

elanmart commented 8 years ago

@KlaymenGC The following code should work for You. The ugly part is that you actually need to pass your labels as 3d one-hot matrix. Perhaps this could be fixed in the future?

from keras.layers.core import Dense, Dropout, TimeDistributedDense, Reshape, Activation

NUM_LABLES=21
NUM_EXAMPLES=1280

# One-hotting labels
inds = np.random.randint(0, NUM_LABLES-1, size=(NUM_EXAMPLES, 1500))
labels = np.zeros((NUM_EXAMPLES, 1500, NUM_LABLES))

i = np.arange(NUM_EXAMPLES)
j = np.arange(1500)
ii,jj = np.ix_(i,j)

labels[ii,jj,inds] = 1

# For simplicity we assume the images are flattened.
X = np.random.randn(NUM_EXAMPLES, 1500)

mlp = Sequential()
# Your model goes here
# Assume it ends with a Fully-Connected layer
mlp.add(Dense(512, input_shape=(X.shape[1], ), activation='relu'))

# Now we predict
mlp.add(Dense(1500 * NUM_LABLES))
mlp.add(Reshape((1500, 21)))
mlp.add(Activation('softmax'))

mlp.compile(Adam(), 'categorical_crossentropy')
ndor commented 8 years ago

the best way is to do convolutions (2D) of 21x1x1 kernel - i.e. hyper-column, for pixel wise segmentation, however, because of the shape issue with softmax (only) i'm getting: ' Exception: Cannot apply softmax to a tensor that is not 2D or 3D. Here, ndim=4 ' i've sent a mail to Francois Chollet (the Keras guy)...

if anyone knows how to tackle this please let me know... (ndor123@gmail.com)

till then, im doing sigmoid activation on the hyper-column kernel with mean square error as the loss function... only 90%... (softmax would have tweaked it for sure)

KlaymenGC commented 8 years ago

@ndor The solution has already been mentioned in the posts above, you just need to do an argmax to the predicted probabilities and then reshape it to (img_col, img_row) to get the pixelwise segmentation.

ndor commented 8 years ago

@KlaymenGC, flattening will produce a vector (WxHxClasses) with more than a single appearance of 1, which is not the way to classify with softmax...

ghost commented 8 years ago

Ok,

So i want to do pixelwise prediction and my output is (SAMPLES, H, W, Classes). Before that i do softmax on (SAMPLESxHxW, Classes) than reshape back to (SAMPLES, H, W, Classes).

The problem is with the loss function i think. With categorical_crossentropy i get for all pixels the same prediction...

Is that because what @EderSantana wrote? is there a fix or an open PR?

10x

ndor commented 8 years ago

@fchollet ... ? HELP please

ndor commented 8 years ago

so... after a good advice from a friend, i solved it with a Lambda() layer, as goes:

def depth_softmax(matrix):
    sigmoid = lambda x: 1 / (1 + K.exp(-x))
    sigmoided_matrix = sigmoid(matrix)
    softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
    return softmax_matrix

than you implement in the Lambda() layer:

model.add(Convolution2D(23, 1, 1, border_mode='same', W_regularizer=l1l2(l1=0.0001, l2=0.0001), b_regularizer=None, activity_regularizer=activity_l1l2(l1=0.0001, l2=0.0001)))
    model.add(BatchNormalization())
    # model.add(Activation('relu'))
    model.add(Lambda(depth_softmax))

safe parsing y’all!

rawmean commented 6 years ago

I don't think the depth_softmax function code as defined by @ndor is correct. I believe this is a correct implementation

def depth_softmax(matrix, is_tensor=True):
# increase temperature to make the softmax more sure of itself
temp = 5.0

if is_tensor:
    exp_matrix = K.exp(matrix*temp)
    softmax_matrix = exp_matrix / K.sum(exp_matrix, axis=2, keepdims=True)
else:
    exp_matrix = np.exp(matrix*temp)
    softmax_matrix = exp_matrix / np.sum(exp_matrix, axis=2, keepdims=True)

return softmax_matrix
pliu19 commented 6 years ago

@rawmean Hi, thank you for your suggestion!

Can you please provide an example how to use that? Do you mean it needs to be passed as activation function?

Also, thank you for the temperature concept!

pissw2016 commented 5 years ago

I am new to python ,keras and english, so welcome for your advice. here is how I process 3D data ,I change their label dimendion and I change the out put shape of the network then I try to compare the result of(sigmoid + binary_crossentrophy) and (softmax + categorical_crossentrophy) the first code is (sigmoid + binary_crossentrophy) the second is (softmax + categorical_crossentrophy) -------------------------------------------------first----------------------------------

from __future__ import print_function
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import keras
import PIL
from PIL import Image
from keras import Model, Input, optimizers
from keras.applications import vgg16, inception_v3, resnet50, mobilenet
from keras.layers import Conv2D,Lambda,Reshape
from keras.preprocessing.image import ImageDataGenerator, load_img

#数据预处理 data preprocessing
#下面将我的label从2284*30*40*1 转成2284*1200*14的onehot编码 trasfer my GT label from 2284*30*40*1 to 2284*1200*14
#2284是图片数量 2284 is the number of picture
#14是类别数量 14 is the number of category
#img和lab是你的图片和标注图片。 img is the array of you picture and lab is the array of your label
#img大小是2284*480*640*3  img.shape = 2284*480*640*3
#lab是2284*480*640 lab.shape = 2284*480*640
#trainval_list是你的训练和validation数据序号列表,因为2284张图片包含了900多张测试图片,我需要筛一下 
#trainval_list is your list of train and validation.cause there are 900 pictures of test in 2284

img_trainval = img[trainval_list, :, :, :]
mini_lab = lab[:,::16,::16]

sum = np.zeros(shape=(2284, 1200, 14))
for i in range(2284):
    pic_lab = mini_lab[i, :, :]
    pic_flatten = np.reshape(pic_lab, (1, 1200))
    pic_onehot = keras.utils.to_categorical(pic_flatten, 14)
    sum[i] = pic_onehot
lab_trainval = sum[trainval_list, :, :]

#网络结构是非常简单的
#the structure os network is extremly simple 
os.environ['CUDA_VISIBLE_DEVICES']='0'
resnet_model = resnet50.ResNet50(weights = 'imagenet', include_top=False,input_shape = (480,640,3))
layer_name = 'activation_40'
res16 = Model(inputs=resnet_model.input, outputs=resnet_model.get_layer(layer_name).output)
input_real = Input(shape=(480,640,3))
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
x = res16(input_real)
x = Conv2D(14, (1, 1), activation='relu')(x)
sig_out = Conv2D(14,(1,1),activation = 'sigmoid')(x)
out_reshape = Reshape((1200,14))(sig_out)

#配置训练参数
#compile parameter ,still a lot of things to learn
model_simple1 = Model(inputs=input_real, outputs=out_reshape)
model_simple1.summary()
model_simple1.compile(loss="binary_crossentropy", optimizer=sgd, metrics=['accuracy','categorical_accuracy'])
model_simple1.fit(x=img_trainval, y=lab_trainval, epochs=200, shuffle=True, batch_size=2)

and the structure of network is :


warnings.warn('The output shape of ResNet50(include_top=False) '


Layer (type) Output Shape Param (ok I am new to github comment,too. Why it is enlarged and bold?)

input_2 (InputLayer) (None, 480, 640, 3) 0


model_1 (Model) (None, 30, 40, 1024) 8589184


conv2d_1 (Conv2D) (None, 30, 40, 14) 14350


conv2d_2 (Conv2D) (None, 30, 40, 14) 210


reshape_1 (Reshape) (None, 1200, 14) 0

Total params: 8,603,744 Trainable params: 8,573,152 Non-trainable params: 30,592

the accuracy: Epoch 1/200 1370/1370 [==============================] - 224s 164ms/step - loss: 0.2772 - acc: 0.8955 - categorical_accuracy: 0.2184 Epoch 2/200 1370/1370 [==============================] - 218s 159ms/step - loss: 0.2113 - acc: 0.9281 - categorical_accuracy: 0.2910 --------------------------------------------------second----------------------------

from __future__ import print_function
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import keras
import PIL
from PIL import Image
from keras import Model, Input, optimizers
from keras.applications import vgg16, inception_v3, resnet50, mobilenet
from keras.layers import Conv2D,Lambda,Reshape
from keras.preprocessing.image import ImageDataGenerator, load_img

#数据预处理 data preprocessing
#下面将我的label从2284*30*40*1 转成2284*1200*14的onehot编码 trasfer my GT label from 2284*30*40*1 to 2284*1200*14
#2284是图片数量 2284 is the number of picture
#14是类别数量 14 is the number of category
#img和lab是你的图片和标注图片。 img is the array of you picture and lab is the array of your label
#img大小是2284*480*640*3  img.shape = 2284*480*640*3
#lab是2284*480*640 lab.shape = 2284*480*640
#trainval_list是你的训练和validation数据序号列表,因为2284张图片包含了900多张测试图片,我需要筛一下 
#trainval_list is your list of train and validation.cause there are 900 pictures of test in 2284

img_trainval = img[trainval_list, :, :, :]
mini_lab = lab[:,::16,::16]

sum = np.zeros(shape=(2284, 1200, 14))
for i in range(2284):
    pic_lab = mini_lab[i, :, :]
    pic_flatten = np.reshape(pic_lab, (1, 1200))
    pic_onehot = keras.utils.to_categorical(pic_flatten, 14)
    sum[i] = pic_onehot
lab_trainval = sum[trainval_list, :, :]

#网络结构是非常简单的
#the structure os network is extremly simple 
os.environ['CUDA_VISIBLE_DEVICES']='1'
resnet_model = resnet50.ResNet50(weights = 'imagenet', include_top=False,input_shape = (480,640,3))
layer_name = 'activation_40'
res16 = Model(inputs=resnet_model.input, outputs=resnet_model.get_layer(layer_name).output)
input_real = Input(shape=(480,640,3))
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
x = res16(input_real)
x = Conv2D(14, (1, 1), activation='relu')(x)
x = Conv2D(14, (1, 1), activation='softmax')(x)
out_reshape = Reshape((1200,14))(x)

#配置训练参数
# parameter configuration ,still a lot of things to learn
model_simple1 = Model(inputs=input_real, outputs=out_reshape)
model_simple1.summary()
model_simple1.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=['accuracy','categorical_accuracy'])
model_simple1.fit(x=img_trainval, y=lab_trainval, epochs=200, shuffle=True, batch_size=2)

the struture of the second network: warnings.warn('The output shape of ResNet50(include_top=False) '


Layer (type) Output Shape Param

input_2 (InputLayer) (None, 480, 640, 3) 0


model_1 (Model) (None, 30, 40, 1024) 8589184


conv2d_1 (Conv2D) (None, 30, 40, 14) 14350


conv2d_2 (Conv2D) (None, 30, 40, 14) 210


reshape_1 (Reshape) (None, 1200, 14) 0

Total params: 8,603,744 Trainable params: 8,573,152 Non-trainable params: 30,592

the accuracy:


Epoch 1/200 1370/1370 [==============================] - 239s 174ms/step - loss: 2.0305 - acc: 0.3117 - categorical_accuracy: 0.3117 Epoch 2/200

I didn't attach weight to each class though class 0 mean unlabeled in GT I didn't save my weight I didn't figure out the accuracy There might some error, I am not sure the way of numpy reshape is consistent with the Reshape layer, and I am not sure whether it matters. so far the training result shows that first code process fast but with low categorical accuracy the second is slow while more accurate

offchan42 commented 4 years ago

It seems now that you can simply do softmax activation on the last Conv2D layer and then specify categorical_crossentropy loss and train on the image without any reshaping tricks. I've tested with a dummy dataset and it works well. Try it ~ !

inp = keras.Input(...)
# define your model here
out = keras.layers.Conv2D(classes, (1, 1), activation='softmax') (...)
model = keras.Model(inputs=[inp], outputs=[out], name='unet')
model.compile(loss='categorical_crossentropy',
                      optimizer='adam',
                      metrics=['accuracy'])
model.fit(tensor4d, tensor4d)

You can also compile using sparse_categorical_crossentropy and then train with output of shape (samples, height, width) where each pixel in the output corresponds to a class label: model.fit(tensor4d, tensor3d)

PS. I use keras from tensorflow.keras (tensorflow 2)