lim-anggun / FgSegNet_v2

FgSegNet_v2: "Learning Multi-scale Features for Foreground Segmentation.” by Long Ang LIM and Hacer YALIM KELES
https://arxiv.org/abs/1808.01477
Other
149 stars 43 forks source link

How can I train with my own video sequence? #3

Closed Wisgon closed 6 years ago

Wisgon commented 6 years ago

As I know, if I want to train with my own video sequence, I should manually config FgSegNetModule.py. But I'm a newbie on keras even on deep learning. I found that I should modify the code bellow to fit my input video:

if dataset_name=='CDnet':
            if(self.scene=='tramCrossroad_1fps'):
                x = MyUpSampling2D(size=(1,1), num_pixels=(2,0), method_name=self.method_name)(x)
            elif(self.scene=='bridgeEntry'):
                x = MyUpSampling2D(size=(1,1), num_pixels=(2,2), method_name=self.method_name)(x)
            elif(self.scene=='fluidHighway'):
                x = MyUpSampling2D(size=(1,1), num_pixels=(2,0), method_name=self.method_name)(x)
            elif(self.scene=='streetCornerAtNight'): 
                x = MyUpSampling2D(size=(1,1), num_pixels=(1,0), method_name=self.method_name)(x)
                x = Cropping2D(cropping=((0, 0),(0, 1)))(x)
            elif(self.scene=='tramStation'):  
                x = Cropping2D(cropping=((1, 0),(0, 0)))(x)
            elif(self.scene=='twoPositionPTZCam'):
                x = MyUpSampling2D(size=(1,1), num_pixels=(0,2), method_name=self.method_name)(x)
            elif(self.scene=='turbulence2'):
                x = Cropping2D(cropping=((1, 0),(0, 0)))(x)
                x = MyUpSampling2D(size=(1,1), num_pixels=(0,1), method_name=self.method_name)(x)
            elif(self.scene=='turbulence3'):
                x = MyUpSampling2D(size=(1,1), num_pixels=(2,0), method_name=self.method_name)(x)

But I don't know what num_pixels I should pass to it... How can I know what num_pixels corresponding my video sequence?And under what situation I should use Cropping2D()? And is there anything I should modify? Thank you very much for replying.

lim-anggun commented 6 years ago

hi @Wisgon, num_pixels = (pixel_height, pixel_width), where pixel_height or pixel_width is the number of pixel that you want to pad your output dimension after 2 times downsampling (2 VGG-16 max-pooling). I rather upscale feature maps instead of zero-pad the feature maps, in my experiment. But you can skip MyUpSampling2D and use Keras ZeroPadding2D instead.

More detail: If your input dimension, say 240x320, so after 2 times downsampling by the encoder, your output will be 60x80. So, after 2 times upsampling by the decoder, your output will be 240x320. In this case, you don't need to use ZeroPadding or MyUpSampling. But if the output from the encoder, say 60x79, you need to pad 1 width-pixel, e.g. x = MyUpSampling2D(num_pixels=(0, 1), method_name=self.method_name)(x)

Wisgon commented 6 years ago

OK, I will try it later, thank you very much.

albertchristianto commented 4 years ago

Hello @lim-anggun, thanks for sharing this great work. I have a question about your design architecture / your problem definition. do you use features accross time (temporal features) for segmenting the background and the foreground? because from your paper, your problem definition seems like semantic segmentation for me. thank you very much. best regards, albert christianto