pfriedri / wdm-3d

PyTorch implementation for "WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis" (DGM4MICCAI 2024)
https://pfriedri.github.io/wdm-3d-io
MIT License
50 stars 5 forks source link

FID and MS-SSIM #6

Closed Jason-u closed 1 week ago

Jason-u commented 1 week ago

image

Dear Author,I have used the pretrained weights file 'brats_unet_128_1200k.pt' to infer 156 images, and then calculated the FID and MS-SSIM metrics. However, the FID is 176.4371, and the MS-SSIM is 0.8892, which are significantly different from the metrics reported in the paper. I have run the inference twice, and the results are within this range. Additionally, I have tried training on the BRATS2021 dataset, and at 100,000 iterations, the inferred FID is 141.5412, and the MS-SSIM is 0.8929. At 210,000 iterations, the FID is 156.3899, and the MS-SSIM is 0.8747. Could you please tell me why there is such a discrepancy?

pfriedri commented 1 week ago

@Jason-u can you specify what you did exactly? As far as I understood, you inferred 156 samples (in our paper we report scores over 1000 samples) and then computed FID and MS-SSIM. How many reference images of the real dataset did you use to compute these scores? Also just 156?

Jason-u commented 1 week ago

no,I use the whole dataset

------------------ Original ------------------ From: Paul Friedrich @.> Date: Fri,Oct 4,2024 3:41 PM To: pfriedri/wdm-3d @.> Cc: Jason-u @.>, Mention @.> Subject: Re: [pfriedri/wdm-3d] FID and MS-SSIM (Issue #6)

pfriedri commented 1 week ago

@Jason-u I think this is the problem. If you did not change the variable sets.num_samples to another value (default is 1000) you actually initialize the array containing the activations from the feature extraction network to have 1000 entries (pred_arr = np.empty((sets.num_samples, sets.dims)). As np.empty initializes this array with arbitrary values and you just fill 156 of the 1000 entries with non-random numbers, the computed FID scores make no sense at all. You would rather need to reduce this number to the number of samples (this would make the results less robust) or generate more samples (in this case 1000).

Jason-u commented 1 week ago

Thank you for your reply, I have run it again and the results are very good. Do you think there is a way to improve the MS-SSIM score? Is it because the wavelet caused its low MS-SSIM metric? I would like to seek your advice, thank you for your help.

pfriedri commented 1 week ago

@Jason-u good to hear. I'm not sure I understand the question. We use MS-SSIM to measure the diversity of the generated images. A low MS-SSIM score in this case indicates that the generated images are not similar to each other, i.e. they are diverse. So a low MS-SSIM score is what we actually want.

Jason-u commented 1 week ago

Translation: Haha, I'm sorry for such a mistake, thank you for your help!