jinyeying / DeS3_Deshadow

[AAAI'2024] DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity. First diffusion-based shadow removal performs robustly on hard, soft and self shadows. https://arxiv.org/abs/2211.08089
12 stars 2 forks source link

Maybe a wrong version #3

Open ZeweiLi999 opened 3 months ago

ZeweiLi999 commented 3 months ago

Hi Jin Yeying,

Thank you for your wonderful work. However, I can't find the attention map loss and ViT loss in the code. Perhaps you uploaded the wrong version to GitHub.

Looking forward to receiving the correct version of the code.

jinyeying commented 3 months ago

Hi Jin Yeying,

Thank you for your wonderful work. However, I can't find the attention map loss and ViT loss in the code. Perhaps you uploaded the wrong version to GitHub.

Looking forward to receiving the correct version of the code.

yes, you are right. I have not cleaned the attention map loss and ViT loss code yet.

atsatvik commented 1 month ago

Hi.

I was wondering if you have updated the code yet?

jinyeying commented 1 month ago

Hi.

I was wondering if you have updated the code yet?

the aistd version is a basic version without losses version. you can simply use it for training and inference.

atsatvik commented 1 month ago

Hey, I'm trying to build upon your work and I would greatly appreciate if you can provide me with the necessary additions (like the ViT similarity and attention loss) that I need to make to recreate the results from the paper, thank you!

atsatvik commented 1 month ago

I have a question: Since we are using a noisy shadow image -> feed it into the diffusion network -> get noise removed + shadow removed image

This image should ideally not have any shadows and if it's perfect then the classification label will be 1 (no shadows) otherwise it will be zero (shadow present), is my understanding correct?

And if this is the case then how do we decide whether the image has no shadow whatsoever, do we compare the diffusion model output image with shadow image and see how close it is and then set the classification label accordingly. Is that what is happening here?