U-net implementation - Githubissues

GeorgiosSopi commented 4 years ago

Dear Zhang Zhao,

Hope you 're having a good day!

I am working on a human activity recognition project and more specifically trying to detect micro-activities e.g. screwing with tools. I was very amazed by reading about your work on the "Human Activity Recognition Based on Motion Sensor Using U-Net" paper and I would like to implement it and check if semantic segmentation can be valuable in my work.

But I'm having some problems because the model doesn't learn.. and Im trying to find where Im making a mistake? I was wondering if you could provide some help ..

c1 = conv2d_block( n_filters * 1, k_s_conv , padding = 'same' )(target) c1 = Activation(activation_fun)(c1) c1 = BatchNormalization()(c1)

c1 = conv2d_block( n_filters * 1, k_s_conv , padding = 'same' )(c1)
c1 = Activation(activation_fun)(c1)
c1 = BatchNormalization()(c1)
p1 = MaxPooling2D(p_size_12)(c1)
p1 = Dropout(dropout)(p1)

c2 = conv2d_block( n_filters * 2,  k_s_conv, padding = 'same',activation = activation_fun)(p1)
c2 = BatchNormalization()(c2)
c2 = conv2d_block( n_filters * 2,  k_s_conv, padding = 'same',activation = activation_fun)(c2)
c2 = BatchNormalization()(c2)
p2 = MaxPooling2D(p_size_12)(c2)
p2 = Dropout(dropout)(p2)

c3 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(p2)
c3 = BatchNormalization()(c3)
c3 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(c3)
c3 = BatchNormalization()(c3)
p3 = MaxPooling2D(p_size_12)(c3)
p3 = Dropout(dropout)(p3)

c4 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(p3)
c4 = BatchNormalization()(c4)
c4 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(c4)
c4 = BatchNormalization()(c4)
p4 = MaxPooling2D(p_size_12)(c4)
p4 = Dropout(dropout)(p4)

c5 = conv2d_block( n_filters * 16, k_s_conv, padding = 'same',activation = activation_fun)(p4)
c5 = BatchNormalization()(c5)
c5 = conv2d_block( n_filters * 16, k_s_conv, padding = 'same',activation = activation_fun)(c5)
c5 = BatchNormalization()(c5)

# Expansive Path
u6 = Conv2DTranspose(n_filters * 8, (1, 3), strides = (1, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(u6)
c6 = BatchNormalization()(c6)
c6 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(c6)
c6 = BatchNormalization()(c6)

u7 = Conv2DTranspose(n_filters * 4, (1, 3), strides = (1, 2), padding = 'same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(u7)
c7 = BatchNormalization()(c7)
c7 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(c7)
c7 = BatchNormalization()(c7)

u8 = Conv2DTranspose(n_filters * 2, (1, 3), strides = (1, 2), padding = 'same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block( n_filters * 2, k_s_conv, padding = 'same',activation = activation_fun)(u8)
c8 = BatchNormalization()(c8)
c8 = conv2d_block( n_filters * 2, k_s_conv, padding = 'same',activation = activation_fun)(c8)
c8 = BatchNormalization()(c8)

u9 = Conv2DTranspose(n_filters * 1, (1, 3), strides = (1, 2), padding = 'same')(c8)
u9 = concatenate([u9, c1])
u9 = Dropout(dropout)(u9)
c9 = conv2d_block( n_filters * 1, k_s_conv, padding = 'same',activation = activation_fun)(u9)
c9 = BatchNormalization()(c9)
c9 = conv2d_block( n_filters * 1, k_s_conv, padding = 'same',activation = activation_fun)(c9)
c9 = BatchNormalization()(c9)

conv10 = Conv2D(_n_classes, (1, 1))(c9)

n_classes= 11

and also maybe batch_normalization is not necessary.. but I'm testing what works and what does not work. so after conv10 i have to reshape???? in order to get a ''flattened" result ? and a dense layer ? because my "y" is one hot encoded so i have to have (?,Number ) vs (?,11) shape = conv10.get_shape().as_list() fcl1=tf.reshape(conv10, [-1, shape[1] shape[2] shape[3]])

return tf.layers.dense(fcl1,  n_classes, activation='softmax', name="output_1")

############################################# Tensor("input:0", shape=(?, 60, 36), dtype=float32) input original [None, 60, 12, 3] reshaped Input to fit the conv2d

and the input to c1 (conv1)is the target below

Tensor("zero_padding2d/Pad:0", shape=(?, 96, 48, 3), dtype=float32) target

u-net image

Any idea what am i doing wrong ?

also i have prepared the training and test data by giving labels to each sample in windows of 1000ms with 0 overlap.

Thank you very much in advance!

Best Georgios Sopidis

zhangzhao156 commented 4 years ago

Dear Zhang Zhao,

Hope you 're having a good day!

I am working on a human activity recognition project and more specifically trying to detect micro-activities e.g. screwing with tools. I was very amazed by reading about your work on the "Human Activity Recognition Based on Motion Sensor Using U-Net" paper and I would like to implement it and check if semantic segmentation can be valuable in my work.

But I'm having some problems because the model doesn't learn.. and Im trying to find where Im making a mistake? I was wondering if you could provide some help ..

c1 = conv2d_block( n_filters * 1, k_s_conv , padding = 'same' )(target) c1 = Activation(activation_fun)(c1) c1 = BatchNormalization()(c1)
c1 = conv2d_block( n_filters * 1, k_s_conv , padding = 'same' )(c1)
c1 = Activation(activation_fun)(c1)
c1 = BatchNormalization()(c1)
p1 = MaxPooling2D(p_size_12)(c1)
p1 = Dropout(dropout)(p1)

c2 = conv2d_block( n_filters * 2,  k_s_conv, padding = 'same',activation = activation_fun)(p1)
c2 = BatchNormalization()(c2)
c2 = conv2d_block( n_filters * 2,  k_s_conv, padding = 'same',activation = activation_fun)(c2)
c2 = BatchNormalization()(c2)
p2 = MaxPooling2D(p_size_12)(c2)
p2 = Dropout(dropout)(p2)

c3 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(p2)
c3 = BatchNormalization()(c3)
c3 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(c3)
c3 = BatchNormalization()(c3)
p3 = MaxPooling2D(p_size_12)(c3)
p3 = Dropout(dropout)(p3)

c4 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(p3)
c4 = BatchNormalization()(c4)
c4 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(c4)
c4 = BatchNormalization()(c4)
p4 = MaxPooling2D(p_size_12)(c4)
p4 = Dropout(dropout)(p4)

c5 = conv2d_block( n_filters * 16, k_s_conv, padding = 'same',activation = activation_fun)(p4)
c5 = BatchNormalization()(c5)
c5 = conv2d_block( n_filters * 16, k_s_conv, padding = 'same',activation = activation_fun)(c5)
c5 = BatchNormalization()(c5)

# Expansive Path
u6 = Conv2DTranspose(n_filters * 8, (1, 3), strides = (1, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(u6)
c6 = BatchNormalization()(c6)
c6 = conv2d_block( n_filters * 8, k_s_conv, padding = 'same',activation = activation_fun)(c6)
c6 = BatchNormalization()(c6)

u7 = Conv2DTranspose(n_filters * 4, (1, 3), strides = (1, 2), padding = 'same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(u7)
c7 = BatchNormalization()(c7)
c7 = conv2d_block( n_filters * 4, k_s_conv, padding = 'same',activation = activation_fun)(c7)
c7 = BatchNormalization()(c7)

u8 = Conv2DTranspose(n_filters * 2, (1, 3), strides = (1, 2), padding = 'same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block( n_filters * 2, k_s_conv, padding = 'same',activation = activation_fun)(u8)
c8 = BatchNormalization()(c8)
c8 = conv2d_block( n_filters * 2, k_s_conv, padding = 'same',activation = activation_fun)(c8)
c8 = BatchNormalization()(c8)

u9 = Conv2DTranspose(n_filters * 1, (1, 3), strides = (1, 2), padding = 'same')(c8)
u9 = concatenate([u9, c1])
u9 = Dropout(dropout)(u9)
c9 = conv2d_block( n_filters * 1, k_s_conv, padding = 'same',activation = activation_fun)(u9)
c9 = BatchNormalization()(c9)
c9 = conv2d_block( n_filters * 1, k_s_conv, padding = 'same',activation = activation_fun)(c9)
c9 = BatchNormalization()(c9)

conv10 = Conv2D(_n_classes, (1, 1))(c9)
n_classes= 11

and also maybe batch_normalization is not necessary.. but I'm testing what works and what does not work. so after conv10 i have to reshape???? in order to get a ''flattened" result ? and a dense layer ? because my "y" is one hot encoded so i have to have (?,Number ) vs (?,11) shape = conv10.get_shape().as_list() fcl1=tf.reshape(conv10, [-1, shape[1] shape[2] shape[3]])
return tf.layers.dense(fcl1,  n_classes, activation='softmax', name="output_1") 
############################################# Tensor("input:0", shape=(?, 60, 36), dtype=float32) input original [None, 60, 12, 3] reshaped Input to fit the conv2d

and the input to c1 (conv1)is the target below

Tensor("zero_padding2d/Pad:0", shape=(?, 96, 48, 3), dtype=float32) target

Any idea what am i doing wrong ?

also i have prepared the training and test data by giving labels to each sample in windows of 1000ms with 0 overlap.

Thank you very much in advance!

Best Georgios Sopidis

Hi, Conv2D(_n_classes, (1, 1))(c9) is actually equivalent to the fully connected operation for classification, I think there is no need for reshape conv10. According to your confusion matrix, all the classes are classified into the first class, but your accuracy is still high (90.12). I wonder whether the training data is unbalanced, U-Net is not very suitable for the unbalanced data, since it will lead to overfitting for the majority class. In addition, I know when the training loss is Nan, the model cannot learn and will be easy to classify into one class.

GeorgiosSopi commented 4 years ago

Thank you very much for your reply!

Yesterday after sending you a message ..i made again some changes ..and it started working ! Yes the dataset is highly unbalanced since the null class is almost 80-90% of it. The reshaping is used because otherwise I get an error about shapes when the model tries to compute the loss (at a later stage) and also the prediction where the classes are one hot encoded!

thank you very much again for taking the time to answer! have a good day!

Best, Georgios

zhangzhao156 / Human-Activity-Recognition-Codes-Datasets

U-net implementation #1

n_classes= 11

n_classes= 11