haqishen / MFNet-pytorch

MFNet-pytorch, image semantic segmentation using RGB-Thermal images
114 stars 25 forks source link

About the dataset #4

Open hexunjie opened 3 years ago

hexunjie commented 3 years ago

I want to ask about the dataset 'images'. I find that the input images are all four channels and I want to know how it is made from the rgb input and the thermal input.

temi92 commented 2 years ago

i am also curious about the dataset here ? are they aligned images in space and time or just in time?

hexunjie commented 2 years ago

They are alined and they work well in the element-wise summation of the fusion part.

------------------ 原始邮件 ------------------ 发件人: "haqishen/MFNet-pytorch" @.>; 发送时间: 2021年9月24日(星期五) 晚上11:09 @.>; @.**@.>; 主题: Re: [haqishen/MFNet-pytorch] About the dataset (#4)

i am also curious about the dataset here ? are they aligned images in space and time or just in time?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

temi92 commented 2 years ago

@hexunjie thanks for the clarification. I am a bit curious on your alignment process was this a stereo calibration routine? In addition, is alignment really crucial? I imagine it would be more important in networks that perhaps expect a 4 channel input? ( early fusion techniques). But in network architectures that require late fusion? Perhaps alignment might not be that crucial. Especially if the encoder part of the networks were processed separately for the RGB and IR images. What are your thoughts on this?

hexunjie commented 2 years ago

1.The dataset is aligned by the proposer. 2.Early 4-channel input is concatened by the 3-channel rgb and 1-channel thermal images. It can be used for the comparative results with 4-channel input or it just makes the invocation more formal. 3.The alignment is crucial because that the late fusion requires the feature results with same size and each pixel in which should be  homologous. This makes a preparation for the element-wise fusion. Of course, the dataset 'images' shows the 4-channel input after alignment and concatenation.

------------------ 原始邮件 ------------------ 发件人: "haqishen/MFNet-pytorch" @.>; 发送时间: 2021年9月28日(星期二) 凌晨5:57 @.>; @.**@.>; 主题: Re: [haqishen/MFNet-pytorch] About the dataset (#4)

@hexunjie thanks for the clarification. I am a bit curious on your alignment process was this a stereo calibration routine? In addition, is alignment really crucial? I imagine it would be more important in networks that perhaps expect a 4 channel input? ( early fusion techniques). But in network architectures that require late fusion? Perhaps alignment might not be that crucial. Especially if the encoder part of the networks were processed separately for the RGB and IR images. What are your thoughts on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

dholukeval commented 2 years ago

@hexunjie is correct. This can be done using following code. It is assumed that RGB and thermal images are aligned( technical term 'transformed'). @temi92 It is important that they aligned otherwise thermal encoder would provide information to wrong part of feature map of RGB encoder. It causes the false training and false results.

rgb_image = np.asarray(PIL.Image.fromarray(rgb_image).resize((640, 480))) thermal_image = np.asarray(PIL.Image.fromarray(thermal_image).resize((640, 480))) fused_image = np.dstack((rgb_image, thermal_image)) print("Fused Image shape after adding: ", fused_image.shape) # (480, 640, 4)