jleuschn / dival

Deep Inversion Validation Library
MIT License
74 stars 13 forks source link

A few questions related to the referenced fbp reconstructor #25

Closed liutianlin0121 closed 3 years ago

liutianlin0121 commented 3 years ago

Hi!

I am wondering if I can ask a few questions related to the referenced fbp reconstructor.

First, for some reason, fbp reconstructor's performance on lodopab yielded by my experiments (code attached below) seems to be much better than what has been reported in the dataset paper For example, the PSNR I got on the test dataset is 30.51, but the corresponding number in the dataset paper is 24.43. I found that the fbp reconstructor results provided by Table 1 of this paper seem to be consistent with my experiments. I wonder if the results provided by the dataset paper should be considered outdated? Below are codes that I used for the fbp reconstructor experiment:

import dival
import numpy as np
from dival.measure import PSNR
from dival.datasets.fbp_dataset import get_cached_fbp_dataset
from tqdm import tqdm

dataset = dival.get_standard_dataset('lodopab')
data_dict = {x: dataset.get_data_pairs(x) for x in ['test', 'validation']}

reconstructor = dival.get_reference_reconstructor('fbp', 'lodopab')

phase = 'test'

#%% evaluate
psnrs = []
running_psnr = 0
running_size = 0 

with tqdm(data_dict[phase]) as pbar:
    for obs, gt in pbar:
        reco = reconstructor.reconstruct(obs)
        current_psnr = PSNR(reco, gt) 
        psnrs.append(current_psnr)
        running_psnr += current_psnr
        running_size += 1
        pbar.set_postfix({'phase': phase,
                          'psnr': running_psnr/running_size})

print('mean psnr: {:f}'.format(np.mean(psnrs)))

Second, it is a bit strange to me that there is a small but noticeable gap between fbp reconstructor's performance and cached fbp results performance. As I mentioned above, the PSNR of the fbp reconstructor on the test data is 30.51. However, if I directly use the cached fbp results to calculate the PSNR, the results dropped to 29.80. I wonder if this is normal. The code I used for the cached fbp dataset is as follows:

import dival
import numpy as np
from dival.measure import PSNR, SSIM
from dival.util.plot import plot_images
from dival.datasets.fbp_dataset import get_cached_fbp_dataset
from tqdm import tqdm
import os

IMPL = 'astra_cuda'
CACHE_DIR = '/home/liu0003/Desktop/datasets/cache_lodopad/' 
CACHE_FILES = {
    'test':
        (os.path.join(CACHE_DIR, 'cache_lodopab_test_fbp.npy'), None)} 

cached_fbp_dataset = get_cached_fbp_dataset(dataset, dataset.get_ray_trafo(impl=IMPL), CACHE_FILES)
# dataset.fbp_dataset = cached_fbp_dataset
# dataset = cached_fbp_dataset
# create PyTorch datasets

pytorch_dataset = {x: cached_fbp_dataset.create_torch_dataset(part=x, reshape=((1,) + cached_fbp_dataset.space[0].shape,
                   (1,) + cached_fbp_dataset.space[1].shape)) for x in ['train', 'validation', 'test']}

phase = 'test'

#%% evaluate
psnrs = []
running_psnr = 0
running_size = 0 

with tqdm(pytorch_dataset[phase]) as pbar:
    for fbp, gt in pbar:
        current_psnr = PSNR(fbp, gt) 
        psnrs.append(current_psnr)
        running_psnr += current_psnr
        running_size += 1
        pbar.set_postfix({'phase': phase,
                          'psnr': running_psnr/running_size})

print('mean psnr: {:f}'.format(np.mean(psnrs)))

where the cache_lodopab_test_fbp.npy file was generated in the following way:

import torch
import numpy as np
from dival import get_standard_dataset
from dival.datasets.fbp_dataset import (
    generate_fbp_cache_files, get_cached_fbp_dataset)
from dival.reference_reconstructors import (
    check_for_params, download_params, get_hyper_params_path)
from dival.util.plot import plot_images

from torch.utils.data import DataLoader
from torch.utils.data import Dataset as TorchDataset

from os import  path

IMPL = 'astra_cuda'
LOG_DIR = './logs/lodopab_fbpunet'
CACHE_DIR = '/home/liu0003/Desktop/datasets/cache_lodopad/' 

SAVE_BEST_LEARNED_PARAMS_PATH = './params/lodopab_fbpunet'

CACHE_FILES = { 'test': (path.join(CACHE_DIR, 'cache_lodopab_test_fbp.npy'), None)} 

dataset = get_standard_dataset('lodopab', impl=IMPL)
ray_trafo = dataset.get_ray_trafo(impl=IMPL)

generate_fbp_cache_files(dataset, ray_trafo, CACHE_FILES)

Another naive question is that, it seems that none of the experiments in the example folder set PSNR.data_range = 1 But if we do that, the PSNR scores will be higher. It is also seems legal to me because pixel intensities are bounded between 0 and 1. Is there a particular reason that PSNR.data_range = 1 was not used?

Thanks a million!

Cheers, Tianlin

jleuschn commented 3 years ago

Hi!

The differences in the FBP reconstructions all stem from different filter hyper parameters. The documentation could have been more clear on this, especially for the cached FBPs. These are the different scenarios:

In the dataset paper, we did not use any regularizing filter, i.e. the hyper parameters filter_type='Ram-Lak', frequency_scaling=1.0. This is suboptimal, since the measurements contain noise, so the results from this paper are not state-of-the-art. The post-processing U-Net trained in that paper also used these (poor quality) FBPs.

The reference FBP reconstructor has hyper parameters optimized for the dataset: filter_type='Hann', frequency_scaling=0.641. All the currently included reference reconstructors are documented in this paper. It served as the source for the baseline methods in the paper you linked.

Regarding the cached FBPs, they are now by default created with filter_type='Hann', frequency_scaling=1.0 (see here). Having the frequency_scaling=1.0 as opposed to the optimized frequency_scaling=0.641 makes sense for its use as input to the post-processing U-Net, because we would like to make the information also from the high frequencies available to the network.

About the PSNR data_range, we decided to compute it with the max-min range of the individual ground truth. Like you mention, this reduces the values, and depending on the point of view they are too pessimistic. While a value of 1 (or 3071 HU) is possible, it will only occur for very high density materials like metal, so data_range 1 could be argued to be optimistic. We see that data_range 1 is more commonly used, and that a fixed value used for all images makes sense. Based on previous feedback on this we therefore included both the originally proposed variable max-min data range and data_range=1 in the lodopab challenge leaderboard.

Hope this answers your questions, otherwise feel free to ask!

Kind regards, Johannes

liutianlin0121 commented 3 years ago

Thanks a lot!!

liutianlin0121 commented 3 years ago

Hi!

The differences in the FBP reconstructions all stem from different filter hyper parameters. The documentation could have been more clear on this, especially for the cached FBPs. These are the different scenarios:

In the dataset paper, we did not use any regularizing filter, i.e. the hyper parameters filter_type='Ram-Lak', frequency_scaling=1.0. This is suboptimal, since the measurements contain noise, so the results from this paper are not state-of-the-art. The post-processing U-Net trained in that paper also used these (poor quality) FBPs.

The reference FBP reconstructor has hyper parameters optimized for the dataset: filter_type='Hann', frequency_scaling=0.641. All the currently included reference reconstructors are documented in this paper. It served as the source for the baseline methods in the paper you linked.

Regarding the cached FBPs, they are now by default created with filter_type='Hann', frequency_scaling=1.0 (see here). Having the frequency_scaling=1.0 as opposed to the optimized frequency_scaling=0.641 makes sense for its use as input to the post-processing U-Net, because we would like to make the information also from the high frequencies available to the network.

About the PSNR data_range, we decided to compute it with the max-min range of the individual ground truth. Like you mention, this reduces the values, and depending on the point of view they are too pessimistic. While a value of 1 (or 3071 HU) is possible, it will only occur for very high density materials like metal, so data_range 1 could be argued to be optimistic. We see that data_range 1 is more commonly used, and that a fixed value used for all images makes sense. Based on previous feedback on this we therefore included both the originally proposed variable max-min data range and data_range=1 in the lodopab challenge leaderboard.

Hope this answers your questions, otherwise feel free to ask!

Kind regards, Johannes

BTW, thanks for pointing out the LoDoPaB-CT challenge -- I am wondering if it is still possible to create a submission for the challenge?

jleuschn commented 3 years ago

Sure, the challenge is kept open. We would be happy to receive your submissions!