Use your model to genereate PET by MRI

MaoFuyou commented 4 months ago

I use your model ,train paired MRI-PET nii data, but get an extremely bad output! I want to know why and how to deal with it. training_log.csv

shaoyanpan commented 4 months ago

Hi, can you show some visual examples so we can see what is going on in the generated PET?

MaoFuyou commented 4 months ago

I have found some issue , because of the limit of my GPU memory, the img_size only can only set very small such as (64,64,4); or (24,24,24) and with batch_size only 2; Obviously, it will lead to the final image quality very bad;

MaoFuyou commented 4 months ago

Look!This is obviously 64 64 4 ,and it is blurry because the pixel resolution is too low.

shaoyanpan commented 4 months ago

Good to know. Don't forget the patch_size (not the img_size) is there for you to adjust, which can help to avoid GPU memory issue. Try using maybe the patch_size = (256,256,4) or something similar.

MaoFuyou commented 4 months ago

Hi ! I can not get the patch_size how to adjust? I can only see this relevant parameter!

shaoyanpan commented 4 months ago

Ah just commit a new version to the jupter notebook, in the second block:

The thing is I auto pad or crop all images to a same size img_size, then random crop several patch of patch_size for training. In testing, the generated image will also automatically slide from top to bottom, form left to right through the whole image volume.

MaoFuyou commented 4 months ago

Hi you only modify these two parts? patch_size and transform ?; I don't know how to fix it? By the way the input data of mine is 646464;

MaoFuyou commented 4 months ago

Traceback (most recent call last): File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 509, in train(A_to_B_model, optimizer,train_loader1, train_loss_history) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 343, in train for i, (x1,y1) in enumerate(data_loader1): File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 216, in getitem img[i,:,:,:] = after_l['image'] ValueError: could not broadcast input array from shape (64,64,4) into shape (256,256,128)

shaoyanpan commented 4 months ago

Oh I think maybe the commit need some time? Try it again. I check the code should have no problem. Just put the 64x64x64 to img_size, and the patch_size you want to the patch_size (make it as large as your gpu can handle). Btw, PET with size of 64x64x64? Looks too low haha.

MaoFuyou commented 4 months ago

I don't know why can lead to this issue ? Yes, I get the data PET with size 646464 and in the past this MRI-PET data have get some paper output

shaoyanpan commented 4 months ago

can you send me screenshot for this part?

MaoFuyou commented 4 months ago

` def len(self): return len(self.data)

def __getitem__(self, idx):

    img_path, class_name = self.data[idx]
    label_path, class_name = self.label[idx]
    cao = {"image":img_path,'label':label_path}

    if not self.train_flag:
        affined_data_dict = self.test_transforms(cao)   
        img_tensor = affined_data_dict['image'].to(torch.float)
        label_tensor = affined_data_dict['label'].to(torch.float)
    else:
        affined_data_dict = self.train_transforms(cao)   
        img = np.zeros([patch_num, img_size[0], img_size[1], img_size[2]])
        label = np.zeros([patch_num, img_size[0], img_size[1], img_size[2]])
        for i,after_l in enumerate(affined_data_dict):
            img[i,:,:,:] = after_l['image']
            label[i,:,:,:] = after_l['label']
        img_tensor = torch.unsqueeze(torch.from_numpy(img.copy()), 1).to(torch.float)
        label_tensor = torch.unsqueeze(torch.from_numpy(label.copy()), 1).to(torch.float)

    return img_tensor,label_tensor`

MaoFuyou commented 4 months ago

and by the way the input data of mine is nii

shaoyanpan commented 4 months ago

Well you see my version and your version is different in the nii reader

MaoFuyou commented 4 months ago

Ok! I see it and fix it and now it run successfully in training phase but in the evaluation phase:

Traceback (most recent call last): File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 517, in average_loss,average_mae,average_psnr,average_ssim,average_ncc = evaluate(A_to_B_model,epoch,path,test_loader1,best_loss) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 429, in evaluate sampled_images = inferer(condition,diffusion_sampling,model) File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/monai/inferers/inferer.py", line 202, in call return sliding_window_inference( File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/monai/inferers/utils.py", line 185, in sliding_window_inference seg_prob_out = predictor(window_data, *args, kwargs) # batched patch segmentation File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 406, in diffusion_sampling sampled_images = diffusion.p_sample_loop(model,(condition.shape[0], 1, File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 533, in p_sample_loop for sample in self.p_sample_loop_progressive( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 586, in p_sample_loop_progressive out = self.p_sample( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 481, in p_sample out = self.p_mean_variance( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/respace.py", line 92, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 322, in p_mean_variance model_output = model(x_input, self._scale_timesteps(t), model_kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/respace.py", line 123, in call return self.model(x, new_ts, *kwargs) File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 598, in forward h = module(h, emb) File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 44, in forward x = layer(x, emb) File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 251, in forward return checkpoint( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/util_network.py", line 138, in checkpoint return func(inputs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 274, in _forward h = blk(h) File "/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/nnFormer.py", line 350, in forward assert L == S H W, "input feature has wrong size" AssertionError: input feature has wrong size

I can deal with this issue!

MaoFuyou commented 4 months ago

I can figure out training phase is correct but the evalution phase is something wrong:

发生异常: AssertionError input feature has wrong size File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/nnFormer.py", line 350, in forward assert L == S H W, "input feature has wrong size" File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 274, in _forward h = blk(h) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/util_network.py", line 138, in checkpoint return func(inputs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 251, in forward return checkpoint( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 44, in forward x = layer(x, emb) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/network/Diffusion_model_transformer.py", line 598, in forward h = module(h, emb) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/respace.py", line 123, in call return self.model(x, new_ts, kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 322, in p_mean_variance model_output = model(x_input, self._scale_timesteps(t), model_kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/respace.py", line 92, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), args, **kwargs) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 481, in p_sample out = self.p_mean_variance( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 586, in p_sample_loop_progressive out = self.p_sample( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/diffusion/GaussianDiffusion.py", line 533, in p_sample_loop for sample in self.p_sample_loop_progressive( File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 406, in diffusion_sampling sampled_images = diffusion.p_sample_loop(model,(condition.shape[0], 1, File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 429, in evaluate sampled_images = inferer(condition,diffusion_sampling,model) File "/root/commonfiles/Maoeyu/MRI-PET-diffu/main.py", line 517, in average_loss,average_mae,average_psnr,average_ssim,average_ncc = evaluate(A_to_B_model,epoch,path,test_loader1,best_loss) AssertionError: input feature has wrong size

shaoyanpan commented 4 months ago

Weird, this error occurs when your input image size (patch_size) is not large enough. But if you are good in training, you should be also good in testing. Can you try patch_size = (64x64x8)?

MaoFuyou commented 4 months ago

Ok I try it!

MaoFuyou commented 4 months ago

Oh no! 64648 and batch_size is 2 is out of CUDA memory . Is there any other option?

shaoyanpan commented 4 months ago

so you are good in 64x64x8? Then I suggest you use batch=1, so you can also try patch_size = (32x32x8)

MaoFuyou commented 4 months ago

Ok I am tring !

MaoFuyou commented 4 months ago

50136 Oh no it still has something wrong!

shaoyanpan commented 4 months ago

I know what happen. The network is too deep to downsample the input into non-dividible features. In the building network block in the jupyter, try:

num_channels=64 attention_resolutions="16,8" channel_mult = (1, 2, 3) num_heads=[4,4,8] window_size = [[4,4,4],[4,4,4],[4,4,2]] num_res_blocks = [2,2,2] sample_kernel=([2,2,2],[2,2,1],[2,2,1]),

Here I reduce the network depth from 4 to 3. Try this with 32x32x4 patch size. When you have larger gpu, increase the depth again.

MaoFuyou commented 4 months ago

It is sad! I have reduce the network depth from 4 to 3 .

MaoFuyou commented 4 months ago

Maybe if you have time and I send my code and part of data to you!

shaoyanpan commented 4 months ago

how about try this one? Let's do one last try.

num_channels=64 attention_resolutions="16,8" channel_mult = (1, 2, 3) num_heads=[4,4,8] window_size = [[4,4,4],[4,4,4],[4,4,2]] num_res_blocks = [2,2,2] sample_kernel=([1,1,1],[2,2,1],[2,2,1]),

MaoFuyou commented 4 months ago

Oh it is still failed.

shaoyanpan commented 4 months ago

Let me do it tomorrow haha. I wil make one for the patch size = 32x32x8. Btw, have you try batch_size= 1 for 64x64x8？

MaoFuyou commented 4 months ago

When i use batch_size =1 and 64648 ; it ouccrs a new wrong situation:

MaoFuyou commented 4 months ago

x1 y1 is the output of my debugging phase . I can not figure out why they have the different dimension.

shaoyanpan commented 4 months ago

I see. The error tells your network output is 1x1x84x102x84 (I think because your MRI has the same size), but your truth PET image is 1x1x91x102x91, so cannot calculate the mae. That should be data error.

Another thing, I suggest you need to start this code in your reader: To make sure your image is some consistency size and should be divided by 2^4 (since you have four layers)

But be careful, the above error may tells you, you have different voxel spacing for your MRI and PET. This is not good. You should make them into same spacing then pad/crop them.

MaoFuyou commented 4 months ago

Yes, it may need some time to run and whether I can get your telegram or X account to commnuicate detailly.

MaoFuyou commented 4 months ago

Since I start the ResizeWithPadCropd, the phase is holding on 微信截图_2 but the gpu memory is full. It is normal?

MaoFuyou commented 4 months ago

It is still on holding GPU. I want to ask this ResizeWithPadOrCropd is suited for train or test or both?

shaoyanpan commented 4 months ago

You should use this resize function in both training and testing. Well it is normal to have GPU in full. Is it still running and showing loss and evaluation mae with different epochs?

MaoFuyou commented 4 months ago

Ok! It is still running but only set in the epoch 0;

shaoyanpan commented 4 months ago

you mean it is still in epoch 0 and still evaluating?

MaoFuyou commented 4 months ago

Yes;

/root/userfolder/tonggan/anaconda3/envs/Xiong_poj/lib/python3.8/site-packages/torch/nn/functional.py:3657: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn( optimization time: 5.730208873748779 [ 0/ 147 ( 0%)] A_to_B_Loss: 1.0015516 optimization time: 0.9895610809326172 [ 30/ 147 ( 20%)] A_to_B_Loss: 1.5312134 optimization time: 1.0017342567443848 [ 60/ 147 ( 41%)] A_to_B_Loss: 1.2967397 optimization time: 1.0088763236999512 [ 90/ 147 ( 61%)] A_to_B_Loss: 1.0642425 optimization time: 1.0104715824127197 [ 120/ 147 ( 82%)] A_to_B_Loss: 0.9033197 Total time per sample is: 151.69771456718445 Averaged loss is: 0.7906994 Execution time: 169.63 seconds

This code has run nearly 4 hours with GPU full.

shaoyanpan commented 4 months ago

can you try commemt out the training, run the evaluation using the untrained ddpm? Just need to comfirm it is working.

MaoFuyou commented 4 months ago

Ok I will try it. It's already late at night in China. I'll tell you about the test tomorrow morning.

MaoFuyou commented 4 months ago

不幸的是，我将训练部分注释，直接显存爆掉了；相关的参数设置以及evaluation部分代码为： ` def evaluate(model,epoch,path,data_loader1,best_loss): model.eval() prediction = [] true = [] img = [] loss_all = [] mae_all = [] psnr_all = [] ssim_all = [] ncc_all = [] aa = time.time() with torch.no_grad(): for i, (x1,y1) in enumerate(data_loader1):

            target = y1.to(device)            
            condition = x1.to(device)
            with torch.cuda.amp.autocast():
                  sampled_images = inferer(condition,diffusion_sampling,model)
            #x1 [1 1 84 102 84]
            #y1 [1 1 91 108 91]
            #sampled_images : [1 1 84 102 84]
            #target : [1 1 91 108 91]
            loss = metric(sampled_images,target)
            # print('L1 loss: '+ str(loss))
            img.append(x1.cpu().numpy())
            true.append(target.cpu().numpy())
            prediction.append(sampled_images.cpu().numpy())    
            loss_all.append(loss.cpu().numpy())

            mae = compute_mae(true[-1],prediction[-1])
            psnr = compute_psnr(true[-1],prediction[-1])
            ssim = compute_ssim(true[-1],prediction[-1])
            ncc = compute_ncc(true[-1],prediction[-1])

            mae_all.append(mae)
            psnr_all.append(psnr)
            ssim_all.append(ssim)
            ncc_all.append(ncc)

    print('optimization time: '+ str(1*(time.time()-aa)))  
    # The save code, you can replace it by your code for other files, e.g. nii or dicom
    data = {"img":img,
            'label':true,
            'prediction':prediction,
            'loss':np.mean(loss_all),
            'mae':np.mean(mae_all),
            'psnr':np.mean(psnr_all),
            'ssim':np.mean(ssim_all),
            'ncc':np.mean(ncc_all)
            }
    scipy.io.savemat(path+ 'test_example_epoch'+str(epoch)+'.mat',data)
    if np.mean(loss_all) < best_loss:
        scipy.io.savemat(path+ 'all_final_test_another.mat',data)
    return np.mean(loss_all),np.mean(mae_all), np.mean(psnr_all), np.mean(ssim_all), np.mean(ncc_all)

    num_channels=64
    attention_resolutions="16,8"
    channel_mult = (1, 2, 3)
    num_heads=[4,4,8]
    window_size = [[4,4,4],[4,4,4],[4,4,2]]
    num_res_blocks = [2,2,2]
    sample_kernel=([1,1,1],[2,2,1],[2,2,1]),

  BATCH_SIZE_TRAIN = 1*1
  img_size = (128,128,128)
  patch_size = (64,64,8)
  spacing = (2,2,2)
  patch_num = 2
  channels = 1
  metric = torch.nn.L1Loss()

`

MaoFuyou commented 4 months ago

我觉得是evaluation存在问题一直占用但是并不运行。

shaoyanpan commented 4 months ago

I don't have the chinese input in the lab computer so please let me do it in English. The code is good in my computer. Let's debug it step by step: add code "print(target.shape)"

Let's see if the error happen in the data loader.

MaoFuyou commented 4 months ago

Ok, I will try it!

MaoFuyou commented 4 months ago

Yes, the target shape is as you see. But the code is still running.

shaoyanpan commented 4 months ago

OK. The reader is correct. Then let's move to the next step:

Comment out this line, unindent the next line. print('aaaaaaaaaaaaa') below the line "sampled_image =......"

MaoFuyou commented 4 months ago

Yes,It seems that run smoothly.

shaoyanpan commented 4 months ago

OK. Now make your testing dataset only have 1 MRI and 1 PET. Run the whole evaluation.

MaoFuyou commented 4 months ago

ok So, it is just too slow?

shaoyanpan commented 4 months ago

Haha yes. I can see your network run 147 data for one timestep is 171 seconds. Lets say about 1 second for one patient one timestep, each patch needs 50 timestep. If one patient has size of 128x128x128, then we have 2x2x32 patchs (if patch size 64x64x4), so each patient should need about 2 hour maybe. It looks your first patient is small so only takes 500 seconds.

shaoyanpan / Synthetic-CT-generation-from-MRI-using-3D-transformer-based-denoising-diffusion-model

Use your model to genereate PET by MRI #6