Is it feasible to replace the three channel images used for training with six channel images in this model?

wgcban / ChangeFormer

[IGARSS'22]: A Transformer-Based Siamese Network for Change Detection

https://www.wgcban.com/research#h.e51z61ujhqim

MIT License

427 stars 57 forks source link

Is it feasible to replace the three channel images used for training with six channel images in this model? #84

Closed FENGJIANJUN99 closed 11 months ago

FENGJIANJUN99 commented 11 months ago

Dear respected author,

Hello! Thank you very much for open-sourcing the paper code. I have a question and would like to seek your advice. I am currently attempting to concatenate two 256x256x3 images of the same scene captured under different lighting conditions into a single 256x256x6 image. This is for transformation detection following the image format of "LEVIR." Is this approach feasible? I encountered an error during code execution at the line "imgs = [TF.to_pil_image(img) for img in imgs]" with the message "File "H:\Anaconda\envs\success\lib\site-packages\torchvision\transforms\functional.py", line 274, in to_pil_image raise ValueError(f"pic should not have > 4 channels. Got {pic.shape[-3]} channels.") ValueError: pic should not have > 4 channels. Got 6 channels." I look forward to your reply.

wgcban commented 11 months ago

@FENGJIANJUN99 Thanks!

I do not think PIL images take more than 3 channels as the input. What is you apply the transformation separately for both images like what we have done in this repo, instead of concatenating them to 6-channel image?

FENGJIANJUN99 commented 11 months ago

@FENGJIANJUN99 Thanks!

I do not think PIL images take more than 3 channels as the input. What is you apply the transformation separately for both images like what we have done in this repo, instead of concatenating them to 6-channel image?

Thank you very much for your response, the author. My job is related to defect detection. In order to obtain the complete characteristics of defects, I have set up two lighting environments (similar to day and night, where different types of defects are more prominent under specific lighting conditions) to take photos of the same workpiece. I want to directly stitch the photos captured in these two lighting environments together, I hope that the network can learn rich features under two different lighting conditions, so I hope to directly convert two 3-channel images into six channel images. Is there a way to implement this in your network?

wgcban commented 11 months ago

@FENGJIANJUN99 Thanks for clarifying your problem statement.

This seems interesting idea. What a possible workaround for your case is,

you can pass x_1_d (image 1 taken at daylight) and x_1_n (image 1 taken at night) separately through the encoder and obtain their representations: F_1_d and F_1_n
similar pass x_2_d and x_2_n through the encoder and obtain the feature representations: F_2_d and F_2_n
Next, obtain the feature differences for each day and hight images separately: Diff_d = F_1_d - F_2_d and Diff_n = F_1_n - F_2_n
Concatenate difference features and pass through the decoder to obtain change map

FENGJIANJUN99 commented 11 months ago

@FENGJIANJUN99 Thanks for clarifying your problem statement.

This seems interesting idea. What a possible workaround for your case is,

you can pass x_1_d (image 1 taken at daylight) and x_1_n (image 1 taken at night) separately through the encoder and obtain their representations: F_1_d and F_1_n

similar pass x_2_d and x_2_n through the encoder and obtain the feature representations: F_2_d and F_2_n

Next, obtain the feature differences for each day and hight images separately: Diff_d = F_1_d - F_2_d and Diff_n = F_1_n - F_2_n

Concatenate difference features and pass through the decoder to obtain change map

Thank you very much for your reply. I have successfully resolved the relevant issue！