mikgroup / UFLoss

MIT License
31 stars 7 forks source link

Train the DL-based reconstruction with UFLoss #6

Open Aristot1e opened 2 years ago

Aristot1e commented 2 years ago

Traceback (most recent call last): File "../train_ufloss.py", line 803, in main(args) File "../train_ufloss.py", line 562, in main model_re.load_state_dict( File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).

This is the error when running launch_training_MoDL_traditional_UFLoss_256_demo.sh. The model shape is not corresponding, so why? I can’t deal with it. And the other problem is in the file train_ufloss.py in line 193/194. if args.loss_normalized == False: output = output std + mean target = target std + mean Both the std and mean are not defined. What should I do?

KeWang0622 commented 2 years ago

Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks!

Aristot1e commented 2 years ago

Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks! Namespace(accelerations=[10, 15], batch_size=1, checkpoint=None, circular_pad=True, data_parallel=False, data_path='/home/img/Desktop/lff/Dataset/pre-processed/multicoil', device='cuda', device_num='0', drop_prob=0.0, efficient_ufloss=False, exp_dir='/home/img/Desktop/lff/Dataset/summary/train-3D_MELD_4steps_MoDLflag0_shared_CGsteps_6date_20210929_ufloss0_ufloss_weight_10_dimension_256_debug', fix_step_size=True, ge_mask=None, kernel_size=3, ### _lossnormalized='True', loss_type=2, loss_uflossdir='/data/train_ufloss/train_UFLoss_feature_256_features_date_202104283_temperature_1_lr1e-5/checkpoints/ckpt200.pth', lr=0.0002, lr_gamma=0.5, lr_step_size=20, meld_cp=False, meld_flag=False, modl_flag=True, modl_lamda=0.05, num_cg_steps=6, num_emaps=1, num_epochs=2000, num_features=256, num_grad_steps=4, num_resblocks=2, patch_size=64, report_interval=10, resume=False, sample_rate=1.0, seed=42, share_weights=True, slwin_init=True, ufloss3d=False, ufloss_weight=10.0, uflossfreq=8, weight_decay=0.0) Using parameters: Temperature: 1.0 2 Traceback (most recent call last): File "../train_ufloss.py", line 803, in main(args) File "../train_ufloss.py", line 562, in main model_re.load_state_dict( File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).

The loss_normalized is setting True, but it can’t help.

KeWang0622 commented 2 years ago

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

Aristot1e commented 2 years ago

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

I trained the UFloss using launch_training_patch_learning.sh. And the total patch_extraction number should be 15568.

Aristot1e commented 2 years ago

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

The total patch_data number I used is 311360. The multicoil knee dataset I downloaded has 973 .h5 files. And then it becomes 15568 going through the data_preprocessing.py. Then to do patch_extraction.py, it becomes 311360. But the error say the current model is torch.Size([256,2457600]). I don't know why it's so huge. Another question is when training the UFloss, the loss is too big (11.3+) after running 200 epochs, how can I make it smaller?

Aristot1e commented 2 years ago

I've got some new problem. After Successfully loaded UFLoss model (Traditional), the error appeared .

  Traceback (most recent call last):
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
      output = self._apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
      return block.array_to_blocks(input, self.blk_shape,
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
      raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
  ValueError: Only support ndim=1, 2, or 3, got 4

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
      output = self._apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 362, in _apply
      output = linop(output)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
      return self.__mul__(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
      return self.apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
      raise RuntimeError('Exceptions from {}.'.format(self)) from e
  RuntimeError: Exceptions from <[1, 1, 73, 40, 1, 2, 60, 60]x[1, 2, 640, 372]> ArrayToBlocks Linop>.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "../train_ufloss.py", line 785, in <module>
      main(args)
    File "../train_ufloss.py", line 568, in main
      train_loss, train_l2, train_ufloss, train_time = train_epoch(args, epoch, model, train_loader, optimizer, writer, model_ufloss)
    File "../train_ufloss.py", line 273, in train_epoch
      ) = compute_metrics(args, model, data, model_ufloss)
    File "../train_ufloss.py", line 223, in compute_metrics
      output_patch = Fa2b(output_roll)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/pytorch.py", line 118, in forward
      return to_pytorch(linop(from_pytorch(
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
      return self.__mul__(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
      return self.apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
      raise RuntimeError('Exceptions from {}.'.format(self)) from e
  RuntimeError: Exceptions from <[2920, 2, 60, 60]x[1, 2, 640, 372]> Reshape * ArrayToBlocks Linop>.

It's about compute_metrics in train_ufloss.py and in train_ufloss.py the line 204 to 228. I don't understand it. Can you help me explain? I'll thank you so much.

               arraytoblock = sp.linop.ArrayToBlocks( 
                    ishape=list(
                        (
                            output_roll.shape[0],
                            2,
                            output_roll.shape[2],
                            output_roll.shape[3],
                        )
                    ),
                    blk_shape=list((output_roll.shape[0], 2, 60, 60)),
                    blk_strides=list((1, 1, n_featuresq, n_featuresq)),
                )

                reshape = sp.linop.Reshape(
                    ishape=arraytoblock.oshape,
                    oshape=(arraytoblock.oshape[2] * arraytoblock.oshape[3], 2, 60, 60),
                )

                Fa2b = sp.to_pytorch_function(reshape * arraytoblock).apply
                output_patch = Fa2b(output_roll)
                target_patch = Fa2b(target_roll)

                output_features = model_ufloss(output_patch)
                target_features = model_ufloss(target_patch)
                ufloss = nn.MSELoss()(output_features[0], target_features[0])
Aristot1e commented 2 years ago
       File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
            return block.array_to_blocks(input, self.blk_shape,
          File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
            raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
        ValueError: Only support ndim=1, 2, or 3, got 4

In sigpy.block.arrat_to_blocks, the dim should be <=3 . Source code: (blk_shape (tuple): block shape of length ndim, with ndim={1, 2, 3}.) But the blk_shape dim you gave is 4 lead to this problem. Which dim should be deleted or something else. I have try my best to deal with it, but it doesn't work. May you give me some advice.

KeWang0622 commented 2 years ago

Hi, I believe it is a sigpy version mismatch! Maybe we can schedule a quick chat to address these issues? And I will update the repo accordingly. Apologize for the bugs, it's in a early development stage and thanks for your feedbacks. What would be the best way to contact you? Ke

Aristot1e commented 2 years ago

Thanks you for replying. We can contact by email or github. And my email is l1i_fan@qq.com. You can email to me anytime.