Keras - how to use class_weight with 3D data

bsafacicek commented 8 years ago

Hi,

I am using Keras to segment images to road and background pixels. As you can imagine percentage of road pixels are much lower than that of background pixels. Hence, I want to use class_weight= {0:0.05, 1:0.95} while fitting the model so that cnn won't predict every pixel as background. But, when I do this I got the following error:

File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 597, in fit sample_weight=sample_weight) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1035, in fit batch_size=batch_size) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 973, in _standardize_user_data in zip(y, sample_weights, class_weights, self.sample_weight_modes)] File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 387, in standardize_weights raise Exception('class_weight not supported for ' Exception: class_weight not supported for 3+ dimensional targets.

My training labels are in this form: (number_of_training_samples=10000, number_of_pixels_in_patch=16384, number_of_classes=2). How can I weight the classes in Keras?

Thanks in advance.

fchollet commented 8 years ago

You should use sample_weight instead. class_weight is not supported for 3+ dimensional targets because the concept of class is ambiguous in that case.

uschmidt83 commented 8 years ago

Hi, I have the same problem.

I don't understand how this can be accomplished by using sample_weight since every pixel of a sample requires a different weight based on its class. Or do you suggest to do this by using sample_weight_mode="temporal"?

bsafacicek commented 8 years ago

Hi,

I also could not get how to use sample_weight for class weighting. Because keras requires that length of the sample_weight should be the same as that of the first dimension of the class labels. But, class labels have also second and third dimensions for image height and width. And to weight the class types, I should weight the pixel labels not just the whole image.

Thanks.

uschmidt83 commented 8 years ago

To follow up on this, I got it to work using sample_weight. It is quite nice if you know what you have to do. Unfortunately, the documentation is not really clear on this, presumably because this feature was originally added for time series data.

You need to reshape your 2D image-sized output as a vector before the loss function when you specify your model.
Use sample_weight_mode="temporal" when you compile the model. This will allow you to pass in a weight matrix for training where each row represents the weight vector for a single sample.

I hope that helps.

rdelassus commented 7 years ago

Hey @kkog, I got exactly the same issue, did you find any solution?

rdelassus commented 7 years ago

@uschmidt83 you said "This will allow you to pass in a weight matrix for training where each row represents the weight vector for a single sample."

But this is not very clear. how did you build you weight vector? Say I have 3 classes, my weight vector size will be equals to the number of pixels in my image, with values being weight_0, weight_1 and weight_2? seems like a waste of space, maybe I'm wrong?

uschmidt83 commented 7 years ago

Hi @rdelassus,

Say I have 3 classes, my weight vector size will be equals to the number of pixels in my image, with values being weight_0, weight_1 and weight_2? seems like a waste of space, maybe I'm wrong?

it seems like a waste of space for your particular use case, although I doubt that this actually matters much in practice. However, it also allows much more fine-grained control, which is probably crucial for other applications/models.

Sorry for the late reply.

mptorr commented 7 years ago

@uschmidt83 I'm having trouble making this work and wonder if you have an insight.

I have 4 classes in a semantic segmentation task, and my class weights are

class_weights = {0: 0.41, 1: 1.87, 2: 1.1, 3: 7.05}

When I put this in class_weight within model.fit I get the same error as you mentioned above.

Exception: class_weight not supported for 3+ dimensional targets.

When I change class_weight to sample_weight within model.fit and add sample_weight_mode='temporal' within model.compile, I get

line 528, in _standardize_weights
    if sample_weight is not None and len(sample_weight.shape) != 2:
AttributeError: 'dict' object has no attribute 'shape'

The shapes in the final portions of the model are

conv2d_19 (Conv2D)           (None, 4, 64, 64)         260       
_________________________________________________________________
reshape_1 (Reshape)          (None, 4, 4096)           0         
_________________________________________________________________
permute_1 (Permute)          (None, 4096, 4)           0         
_________________________________________________________________
activation_1 (Activation)    (None, 4096, 4)           0

Do you have any suggestions to get this to work?

kglspl commented 7 years ago

@mptorr I am facing a similar problem but am stuck elsewhere... However, the way I understand @uschmidt83's suggestion you need to use:

class_weights = np.zeros((4096, 4))
class_weights[:, 0] += 0.41
class_weights[:, 1] += 1.87
class_weights[:, 2] += 1.1
class_weights[:, 3] += 7.05

Hope it helps, please let us know how it goes. And if anyone knows more feel free to chime in. ;-)

mptorr commented 7 years ago

@kglspl by reshaping my layers, I can actually use sample_weight---my issue is now how to do this with data augmentation, if you have time look at #6629 and let me know if you have an insight, thanks

ahundt commented 7 years ago

Figured out where some changes could happen to make progress in this direction. https://github.com/fchollet/keras/issues/6538#issuecomment-302964746

ezisezis commented 7 years ago

@kglspl @mptorr I tried to set the sample weights like suggested. I have a binary pixel-wise classification task that i want to perform that takes in 100x100 images and outputs the same resolution images basically. On final layer I reshape the output so it is the same as in @mptorr arcitecture above. Here is my arch:

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 100, 100, 3)       0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 104, 104, 3)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 102, 102, 32)      896       
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 51, 51, 32)        0         
_________________________________________________________________
conv2 (Conv2D)               (None, 49, 49, 32)        9248      
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 25, 25, 32)        0         
_________________________________________________________________
fc6 (Conv2D)                 (None, 25, 25, 64)        73792     
_________________________________________________________________
dropout_1 (Dropout)          (None, 25, 25, 64)        0         
_________________________________________________________________
fc7 (Conv2D)                 (None, 25, 25, 64)        4160      
_________________________________________________________________
dropout_2 (Dropout)          (None, 25, 25, 64)        0         
_________________________________________________________________
score_fr (Conv2D)            (None, 25, 25, 2)         130       
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 100, 100, 2)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 10000, 2)          0         
=================================================================

Then I try setting sample_weight to this (where 13000 is number of training samples):

sample_weight = np.zeros((13000,10000,2))
sample_weight[:, 0] += 1
sample_weight[:, 1] += 10

But I get this error:

ValueError: Found a sample_weight array with shape (13000, 10000, 2). In order to use timestep-wise sample weighting, you should pass a 2D sample_weight array.

Then I also tried doing this:

sample_weight = np.zeros((10000,2))
sample_weight[:, 0] += 1
sample_weight[:, 1] += 10

But got the following error:

ValueError: Found a sample_weight array with shape (10000, 2) for an input with shape (13000, 10000, 2). sample_weight cannot be broadcast.

So now I am confused. Buy @mptorr you said you made it work by reshaping the layers. So how exactly did you reshape them and how should I do that in my case?

mptorr commented 7 years ago

@ezisezis Could you try reshaping the model output so it ends up like (None, dimx * dimy, classes)? You can see that happening on my code from above (images are 64*64 = 4096, I have 4 classes):

reshape_1 (Reshape)          (None, 4, 4096)           0         
_________________________________________________________________
permute_1 (Permute)          (None, 4096, 4)           0         
_________________________________________________________________
activation_1 (Activation)    (None, 4096, 4)           0

Then reshape your sample_weight to (N, dimx * dimy). Also make sure your masks are (N, dimx * dimy). Essentially this matches masks and sample_weight as flattened tensors. I believe my training images are not flattened. Give it a try and let me know.

ezisezis commented 7 years ago

@mptorr I figured it out a bit earlier today with the output i have in previous comment, from 100x100 images, i output a (10000,2) shape. I have 13000 training images and then the sample_weight dimensions are (13000,10000) and it works very well.

sebastienbaur commented 7 years ago

I ran into a similar problem, using categorical cross entropy

Maybe we can just use a weighted version of it ? What do you think of : K.sum(K.log(1e-9+predicted_proba) * true_labels * class_weight, axis=(1, 2)) Where predicted_proba, true_labels, class_weight are 3 tensors of shape (batch_size, sequence_length, nb_classes)

Note that:

class_weight[i, :, :] is the same array whatever the value of i. Let's call that common value x (a 2d array)
x is actually rank 1, each of its lines being equal to its first one (ex: [[1,2,3], [1,2,3], [1,2,3], [1,2,3]] if there are 3 classes and sequences have a length of 4). This is because the weight does not depend on the position (you can change that if you need to)

That way you can give more importance to rare classes

ahundt commented 7 years ago

Could someone post a concise example of this, or perhaps a small pull request in the examples directory? It seems like a number of people would find it very valuable.

sebastienbaur commented 7 years ago

Below is what I meant with some code.

It is a bit specific to my use case but it should be easy to adapt I guess.

Just tell me if there is something wrong in it. I think that it gives more weight to rare classes. I may have misunderstood the problem

import numpy as np
import keras.backend as K

batch_size = 32
all_y = ...  # a list containing all your class vectors, each being an array of a given size, each of its component being an integer representing a given class
# in my case, I have protein sequences, that I represent as arrays of integers. These integers represent the amino acids composing the sequence (+ the padding char), they are in range(0,21)
bincount = np.bincount(np.concatenate(all_y))
n_samples = 20000
length = 500  # my proteins have a length of 500
n_classes = 21  # there are 20 amino acids + the padding character
class_weight = n_samples*1. / (n_classes * bincount)
weights = np.ones((length, n_classes))
for k, x in enumerate(class_weight):
    weights[:, k] *= x
class_weight = K.constant(np.concatenate(batch_size*[np.array(weights).reshape((1, length, n_classes))]))

def cross_entropy(true, pred):
    return - K.sum(K.log(1e-9+pred) * true * class_weight, axis=(1, 2))

ahundt commented 7 years ago

@sebastienbaur Thanks, that looks like an eay way to add it in. Be careful though! The raw formulation of cross-entropy in your code can be numerically unstable as commented in the tensorflow mnist example, so that might affect your results with the code above.

potis commented 7 years ago

Hi,

I am running into the same problem as @ezisezis, using keras 2.0.5 and theano as backend (python 2.7).

My goal it to use unet to perform image segmentation but the regions i am trying to segment are of different size. {0: 75.0, 1: 89.0, 2: 61.0, 3: 56.0, 4: 194.0, 5: 1.0}

I tried to use sample_weight instead of class weight so I compiled the model accordingly: model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=[dice_coef], sample_weight_mode="temporal")

Here is the input data size:

(4717, 1, 256, 256)

The size of labels:

(4717, 65536, 6)

The size of input weights

(4717, 65536, 6)

The last layers of my network:

conv2d_92 (Conv2D) (None, 6, 256, 256) 33 conv2d_91[0][0]

permute_4 (Permute) (None, 256, 256, 6) 0 conv2d_92[0][0]

reshape_4 (Reshape) (None, 65536, 6) 0 permute_4[0][0]

And finally the error i am getting:

ValueError: Found a sample_weight array with shape (4717, 65536, 6). In order to use timestep-wise sample weighting, you should pass a 2D sample_weight array.

Then I tried the suggestion of @mptorr to reshape your sample_weight to (N, dimx dimy). Also make sure your masks are (N, dimx dimy). (for my task dimx=dimy=256) but here is the error I got: ValueError: Found a sample_weight array with shape (6, 65536) for an input with shape (4717, 6, 65536). sample_weight cannot be broadcast.

Please let me know if you have any suggestion or need more information.

ezisezis commented 7 years ago

Dear @potis . If you read carefully what I experienced and how I solved it, then in your case the input_weights shape has to be: (4717,65536) OR, in general - (number_of_images, number_of_pixels_in_img). So, each value in this 2D array is a weight of the class that the pixel belongs to (you dont have to assign 6 values to each pixel, only one - the class's weight). Hope it makes more sense.

potis commented 7 years ago

@ezisezis thanks for the response. I guess i was miss interpreting the N as number of classes.

joeyearsley commented 7 years ago

For anybody else struggling with this, this details a formula to get class weights for pixels. https://blog.fineighbor.com/tensorflow-dealing-with-imbalanced-data-eb0108b10701

Set your class weightings up like described in the above blog, then set sample_weight_mode=temporal then setup your sample_weights such that (nb_samples, dim_x * dim_y, nb_classes).

To get your sample weights, multiply each output channel with the corresponding class weight found from the above blog (med_freq/freq_cx where x is an element of Classes).

Finally, sample_weights = np.squeeze(np.sum(sample_weights, axis=-1)); We can do this since the channel axis is one-hot encoded.

jmtatsch commented 7 years ago

class_weight is not supported for 3+ dimensional targets because the concept of class is ambiguous in that case.

Can someone elaborate on that pls? I really dont understand why that is ambigous.

EloyRoura commented 7 years ago

@ezisezis Thanks a lot for your post, I've been digging into the code for hours and this really did the trick. I still would prefer class_weight to do what is supposed to, but I don't get how should we use it or in which scenario. Anyway, I still have another question. If the the fit(...) function is used, sample_weights can be the work around, but what about the fit_generator(...), there is no option for that. Do you have a solution in that case?

Thanks in advance :)

ahundt commented 7 years ago

One approach to resolving the ambiguity would be to add support for a property we can attach to a tensor or numpy array that specifies the type of data each dimension in a tensor represents (batch, width, height, depth, class, etc).

EloyRoura commented 7 years ago

OK! I'll answer to my own comment. My bad, I didn't see the input tuple can indeed be (inputs, target, weights), so this should solve the problem

JianbingDong commented 7 years ago

@ezisezis Hi, youve done a great job about this error. But after reading your suggestion, i still dont know how to assign weight to my sample_weight array... If my model output is an array which shape is (pixels_xpixels_y, num_class), and i set my sample_weight array as shape (num_samples, pixels_xpixels_y), could this work? and how should i assign class weights to my sample_weight array? thanks in advance.

rdelassus commented 7 years ago

Following @JianbingDong question, does an array with a shape (num_class) works? such that classes are weighted, not pixels

ghost commented 7 years ago

Let's say for example I have a data set with "y" output of 3 classes: "A" "B" "C" I need to convert the output into categorical format, so: "A" ---> [1,0,0] "B" ---> [0,1,0] "C" ---> [0,0,1]

Now, these classes are imbalanced, in order to balance them I need to use "sample_weight" in the "fit" layer in my MLP model.

What is the formatting of the command ?

sample_weight = class_weight.compute_sample_weight('balanced', np.unique(y_train), y_train) ??

Thank you

ylmeng commented 7 years ago

@NaderNazemi as someone already suggested, you can use the sample_weight (not class_weight) parameter here. Just make a matrix of (samples, weights), in your case (samples, 3) and pass it in sample_weight. It sounds very counter-intuitive but that is how the code was written.

xyl576807077 commented 7 years ago

@ylmeng Hi, if my output is (samples, len(sentence), nb_classes). So sample_weight is (samples, nb_classes)?

ylmeng commented 7 years ago

@xyl576807077 Hmm I found the way I described does not work with keras 2.0. So you have an output for each token in a sentence? I think it is more clear to write a custom loss function, anyway. For example, if you want to use weights to scale losses (tensorflow backend as an example):

def my_weighted_loss(onehot_labels, logits):
    """scale loss based on class weights
    """
    # compute weights based on their frequencies
    class_weights = .... # set your class weights here
    # computer weights based on onehot labels
    weights = tf.reduce_sum(class_weights * onehot_labels, axis=-1)
    # compute (unweighted) softmax cross entropy loss
    unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(labels=[onehot_labels], logits=[logits])
    # apply the weights, relying on broadcasting of the multiplication
    weighted_losses = unweighted_losses * weights
    # reduce the result to get your final loss
    loss = tf.reduce_mean(weighted_losses)
    return loss

Then in your keras code you just need to specify model.compile(loss=my_weighted_loss...) There are other ways to use the weights too, such as the one described by sebastienbaur above.

Kevin-Moon commented 6 years ago

this also seems to be helpful.

https://github.com/keras-team/keras/issues/6261

abdullah693 commented 6 years ago

is there a working solution to this? I am using fit_generator() , so I guess I can't use sample_weights() method. My output is (H,W,C) where H = height, W = width, and C = the number of classes.

Boussenna commented 6 years ago

Hello @abdullah693, any big revelations so far ?

mancaldel commented 6 years ago

I've had the same problem using Unet, where classes need to be weighted, not samples. @mptorr @abdullah693 @EloyRoura My second approach might be useful for you ;)

My first approach was to change the labels to a one-hot encoding, but using the weight of the class instead of a 1 (e.g. [0, 1, 0, 0] --> [0, 10, 0, 0] for a weight=10 in the second class). It worked well, although you need to apply this to your input data and you might need to have a double set of labels, one for training and one for displaying.

Later, I tried a second approach, since I wanted data augmentation and it was too much to duplicate all the labels, so I created a data generator class as described in here (class DataGenerator(keras.utils.Sequence)). Most of the functions are similar to the default ones in the example, but in the __data_generation class I multiply the labels by the weights I provide to the __init__() function: return x, keras.utils.to_categorical(y, num_classes=self.n_classes)*self.class_weights

The problems in these solutions is that you need to store the data as one-hot (which occupies more space and if you have many classes might be unfeasible) OR if you transform it inside the generator, you need two generators (one for training and another one for showing results). Or you can also perform an np.argmax() to get back the class number, the new problem being processing time if your dataset is very big.

One last important thing... Is this class_weight problem going to be implemented in Keras??

eong2012 commented 6 years ago

Say I use: model.compile(optimizer=adam, loss='sparse_categorical_crossentropy', metrics='accuracy', sample_weight_mode="temporal") model.fit_generator(train_gen, validation_data=val_gen) Then, is the calculated loss value being sample-weighted or I need to supply my own custom pixelwise weighted custom loss? How to do it?

janbrrr commented 6 years ago

@eong2012 If you want to use sample weights with a generator, your generator should not just return X, y per batch, the generator has to return X, y, sample_weights (see fit_generator() definition line 181ff.).

Let's say you have a generator that returns the inputs X and the (single integer for sparse) labels y where y is of shape (batch_size, height * width, 1). Assuming you have an array class_weights where the index is the class and the value at each index is the weight, you can simply add the following in the batch generation: ... sample_weights = numpy.take(class_weights, y[:, :, 0]) return X, y, sample_weights

eong2012 commented 6 years ago

@janbrrr Thanks for your help! I have this part working, but I intend to use "sample_weights" = 0.0 to mask out a particular class from training. The script can now train but the weighted accuracy is not high. I am wondering whether: a) Is the loss value computed within Keras being sample-weighted? (I know there is "weighted_metric" option which I can use to give me sample-weighted metric value but I don't see a "weighted loss" option"). b) How can I compute a loss which remove those "sample_weights" = 0.0? That is, I would like to compute a mean loss value which is obtained by taking the total loss divided by number of samples whose "sample_weights" > 0.0 (rather than by dividing by total number of samples). I suppose I need a custom loss function? And how can I access the "sample_weights" from within the custom loss function?

Just to be specific, I would like to achieve something like below (below is non-working code):

def custom_loss(y_true, y_pred, sample_weights): xentropy = K.categorical_crossentropy(y_true, y_pred) loss = sample_weights * xentropy num_non_zero = K.sum(sample_weights > 0.0) mean_loss = loss / num_non_zero return mean_loss

Then use: model.compile(loss=custom_loss, optimizer=adam, metrics='accuracy', sample_weight_mode="temporal")

but I don't think sample_weights can be passed into "custom_loss()" like that? How to access "sample_weights" from within a "custom_loss" function?

c) I wonder whether above custom_loss() will work because not sure whether it is differentiable?

WendyDong commented 6 years ago

@eong2012 have you solved your problem? I met the same problem.

eong2012 commented 6 years ago

@eong2012 have you solved your problem? I met the same problem.

No, no luck on this.

ManuConcepBrito commented 5 years ago

Hi, I am using Keras 2.2.4 and I am trying to implement a loss function for pixel-wise classification as described in here but I am having some of the difficulties presented here. I am doing 3D segmentation, therefore my target vector is (b_size, width_x, width_y, width_z, nb_classes). I implemented the following loss function where weight map is the same shape as target and prediction vector:

def dice_xent_loss(y_true, y_pred, weight_map):

"""Adaptation of https://arxiv.org/pdf/1809.10486.pdf for multilabel 
classification with overlapping pixels between classes. Dec 2018.
"""
    loss_dice = weighted_dice(y_true, y_pred, weight_map)
    loss_xent = weighted_binary_crossentropy(y_true, y_pred, weight_map)

    return loss_dice + loss_xent

def weighted_binary_crossentropy(y_true, y_pred, weight_map):
    return tf.reduce_mean((K.binary_crossentropy(y_true, 
                                                 y_pred)*weight_map)) / (tf.reduce_sum(weight_map) + K.epsilon())

def weighted_dice(y_true, y_pred, weight_map):

    if weight_map is None:
        raise ValueError("Weight map cannot be None")
    if y_true.shape != weight_map.shape:
        raise ValueError("Weight map must be the same size as target vector")

    dice_numerator = 2.0 * K.sum(y_pred * y_true * weight_map, axis=[1,2,3])
    dice_denominator = K.sum(weight_map * y_true, axis=[1,2,3]) + \
                                                             K.sum(y_pred * weight_map, axis=[1,2,3])
    loss_dice = (dice_numerator) / (dice_denominator + K.epsilon())
    h1=tf.square(tf.minimum(0.1,loss_dice)*10-1)
    h2=tf.square(tf.minimum(0.01,loss_dice)*100-1)
    return 1.0 - tf.reduce_mean(loss_dice) + \
            tf.reduce_mean(h1)*10 + \
            tf.reduce_mean(h2)*10

I am compiling the model using sample_weights=temporal as suggested and I am passing the weights to the model.fit as sample_weight=weights. Still I get the following error:

File "overfit_one_case.py", line 153, in <module>
    main()
File "overfit_one_case.py", line 81, in main
   sample_weight_mode="temporal")
 File "/home/igt/anaconda2/envs/niftynet/lib/python2.7/site-packages/keras/engine/training.py", line 342, in compile
sample_weight, mask)
File "/home/igt/anaconda2/envs/niftynet/lib/python2.7/site-packages/keras/engine/training_utils.py", line 404, in weighted
score_array = fn(y_true, y_pred)
TypeError: dice_xent_loss() takes exactly 3 arguments (2 given)

In training_utils.py Keras is calling my custom loss without any weights. Any idea on how to solve this?

TheUchi commented 5 years ago

I think you need something like this:

def custom_loss(weight_map):
    def dice_xent_loss(y_true, y_pred):
        # do stuff
        return loss_dice + loss_xent
    return dice_xent_loss

GusRoth commented 5 years ago

model.fit_generator don't have the parameter "sample_weight", so if I want to use 'fit_generator' to train my model, how can I use the sample_weight or other parameter?

shahriar49 commented 5 years ago

Dear @potis . If you read carefully what I experienced and how I solved it, then in your case the input_weights shape has to be: (4717,65536) OR, in general - (number_of_images, number_of_pixels_in_img). So, each value in this 2D array is a weight of the class that the pixel belongs to (you dont have to assign 6 values to each pixel, only one - the class's weight). Hope it makes more sense.

I followed this guideline to work on my problem: a Landsat time-series image classification using ConvLSTM2D module in Keras. So I have a series of 6-band image patches (say of size 30x30 pixels), each for a different observation time. I want to use all spatial and temporal data, so I am thinking to use ConvLSTM2D and feed it with series of images. The ground truth will be another map of size 30x30, each pixel having a label among 4 possible values, which is one-hot encoded in a vector of length 4. But the main problem is that some of the pixels doesn't have valid labels (for example due to cloudy pixel or other Landsat quality issues). So the map will be a mixture of true labels (values 1-4) and invalid pixels (which are encoded in one-hot vector of all zero values). I was thinking to use the class_weight option to tackle this issue first, but I get the error saying it is not supported for 3D+ tensors. So I opted to use sample_weight. However, I am highly skeptic if my implementation is correct. It runs without error, but the result is far from satisfying. Here is the network structure, a 3-layer network with 16 filters in each layer and convolutional kernel size of 3x3. The first two layers pass sequences, but the last one outputs just one 3D tensor of size 30x30x16. Then I use a convolutional layer with 4 filters and softmax activation, to generate 4 maps corresponding to each label. To mask the invalid labels, I reshape the output tensor from 30x30x4 to 900x4 (otherwise Keras will not be able to use sample_weight):


inputs = Input(shape=(None, 30, 30, 6))
x = ConvLSTM2D(filters=16, kernel_size=(3,3), padding='same', return_sequences=True, data_format='channels_last')(inputs)
x = ConvLSTM2D(filters=16, kernel_size=(3,3), padding='same', return_sequences=True, data_format='channels_last')(x)
x = ConvLSTM2D(filters=16, kernel_size=(3,3), padding='same', return_sequences=False, data_format='channels_last')(x)
x = Conv2D(filters=4, kernel_size=(3,3), padding='same', data_format='channels_last', activation='softmax')(x)
outputs = Reshape((30*30,4))(x)

I use Tensorflow datasets for feeding network, and my dataset generates data map sequence, label map, and a 'mask' map that has 1 for valid label pixels and 0 for the invalid ones. I also add sample_weight_mode="temporal" to model compile statement. I assume that by this arrangement, tensorflow/keras will use the weights as expected. But as I mentioned above, the result is not satisfying (low accuracy) and I am not sure if the above implementation is correct. I don't have any idea how to debug this issue as well.

And just one basic question: Do the sample weights work to weight the output samples before calculating the loss function or it weights the input values from the start? As everybody said that class_weight can be substituted by sample_weight, I think this means that sample_weight is applied just at the last stage before calculating the loss function (justification for the last Reshape layer), am I right? @ezisezis

graffam commented 5 years ago

Is this something I could reasonably create a pull request for, seems like pixel-wise class weighting would be a fairly common request? Or should people address this in the form of a custom loss function?

TheUchi commented 5 years ago

Is this something I could reasonably create a pull request for, seems like pixel-wise class weighting would be a fairly common request? Or should people address this in the form of a custom loss function?

I think, a PR for a pixel-wise class weighting would be very much appreciated since custom loss functions always have the risk of failing for unknown reasons or they have to rely on backend functions, which may or may not be available.

shahriar49 commented 5 years ago

Anybody have any comment about my post above on Jul 8th? I also read here and there that we may need to write a custom loss function (as @janbrrr mentioned before). Do we necessarily need that? Is it wrong to just supply a properly shaped sample_weight and setting sample_weight_mode to "temporal" with native loss functions in Keras? @ezisezis

anilsathyan7 commented 5 years ago

Iam facing a similar issue...how to set the values for weights(sample and class) if i use sparse categorical cross-entropy?? How can i use class weights or sample weights to ignore labels (void) in a segmenation task (setting to zero), where i use sparse categorical entropy as loss function and labels are not one hot encoded? Say my final output shape is (None, 16384, 2) and segmentation label is of shape (None,16384,1) with int values 0(bg),1(fg) and 2(void).Is this possible only by implementing a custom loss function? I tried corresponding tensorflow loss function with weights(custom loss function with tf); but it became more complex and messier !!!

shahriar49 commented 5 years ago

I am skeptic if we need a custom loss function, because keras is embedding weighting and masking in its loss function prototypes inherently (see https://github.com/keras-team/keras/blob/58fd1f0589d33aeb33c4129cfedfb7737495efc0/keras/engine/training.py#L309). I feel that if we supply it with (data, label, sample_weight) and provide class_weight if we need, there should be no reason for defining a custom loss function, but I am not sure. @anilsathyan7

keras-team / keras

Keras - how to use class_weight with 3D data #3653