Open Aristot1e opened 2 years ago
Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks!
Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks! Namespace(accelerations=[10, 15], batch_size=1, checkpoint=None, circular_pad=True, data_parallel=False, data_path='/home/img/Desktop/lff/Dataset/pre-processed/multicoil', device='cuda', device_num='0', drop_prob=0.0, efficient_ufloss=False, exp_dir='/home/img/Desktop/lff/Dataset/summary/train-3D_MELD_4steps_MoDLflag0_shared_CGsteps_6date_20210929_ufloss0_ufloss_weight_10_dimension_256_debug', fix_step_size=True, ge_mask=None, kernel_size=3, ### _lossnormalized='True', loss_type=2, loss_uflossdir='/data/train_ufloss/train_UFLoss_feature_256_features_date_202104283_temperature_1_lr1e-5/checkpoints/ckpt200.pth', lr=0.0002, lr_gamma=0.5, lr_step_size=20, meld_cp=False, meld_flag=False, modl_flag=True, modl_lamda=0.05, num_cg_steps=6, num_emaps=1, num_epochs=2000, num_features=256, num_grad_steps=4, num_resblocks=2, patch_size=64, report_interval=10, resume=False, sample_rate=1.0, seed=42, share_weights=True, slwin_init=True, ufloss3d=False, ufloss_weight=10.0, uflossfreq=8, weight_decay=0.0) Using parameters: Temperature: 1.0 2 Traceback (most recent call last): File "../train_ufloss.py", line 803, in
main(args) File "../train_ufloss.py", line 562, in main model_re.load_state_dict( File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).
The loss_normalized is setting True, but it can’t help.
I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network
I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network
I trained the UFloss using launch_training_patch_learning.sh. And the total patch_extraction number should be 15568.
I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network
The total patch_data number I used is 311360. The multicoil knee dataset I downloaded has 973 .h5 files. And then it becomes 15568 going through the data_preprocessing.py. Then to do patch_extraction.py, it becomes 311360. But the error say the current model is torch.Size([256,2457600]). I don't know why it's so huge. Another question is when training the UFloss, the loss is too big (11.3+) after running 200 epochs, how can I make it smaller?
I've got some new problem. After Successfully loaded UFLoss model (Traditional), the error appeared .
Traceback (most recent call last):
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
output = self._apply(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
return block.array_to_blocks(input, self.blk_shape,
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
ValueError: Only support ndim=1, 2, or 3, got 4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
output = self._apply(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 362, in _apply
output = linop(output)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
return self.__mul__(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
return self.apply(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
raise RuntimeError('Exceptions from {}.'.format(self)) from e
RuntimeError: Exceptions from <[1, 1, 73, 40, 1, 2, 60, 60]x[1, 2, 640, 372]> ArrayToBlocks Linop>.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../train_ufloss.py", line 785, in <module>
main(args)
File "../train_ufloss.py", line 568, in main
train_loss, train_l2, train_ufloss, train_time = train_epoch(args, epoch, model, train_loader, optimizer, writer, model_ufloss)
File "../train_ufloss.py", line 273, in train_epoch
) = compute_metrics(args, model, data, model_ufloss)
File "../train_ufloss.py", line 223, in compute_metrics
output_patch = Fa2b(output_roll)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/pytorch.py", line 118, in forward
return to_pytorch(linop(from_pytorch(
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
return self.__mul__(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
return self.apply(input)
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
raise RuntimeError('Exceptions from {}.'.format(self)) from e
RuntimeError: Exceptions from <[2920, 2, 60, 60]x[1, 2, 640, 372]> Reshape * ArrayToBlocks Linop>.
It's about compute_metrics in train_ufloss.py and in train_ufloss.py the line 204 to 228. I don't understand it. Can you help me explain? I'll thank you so much.
arraytoblock = sp.linop.ArrayToBlocks(
ishape=list(
(
output_roll.shape[0],
2,
output_roll.shape[2],
output_roll.shape[3],
)
),
blk_shape=list((output_roll.shape[0], 2, 60, 60)),
blk_strides=list((1, 1, n_featuresq, n_featuresq)),
)
reshape = sp.linop.Reshape(
ishape=arraytoblock.oshape,
oshape=(arraytoblock.oshape[2] * arraytoblock.oshape[3], 2, 60, 60),
)
Fa2b = sp.to_pytorch_function(reshape * arraytoblock).apply
output_patch = Fa2b(output_roll)
target_patch = Fa2b(target_roll)
output_features = model_ufloss(output_patch)
target_features = model_ufloss(target_patch)
ufloss = nn.MSELoss()(output_features[0], target_features[0])
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
return block.array_to_blocks(input, self.blk_shape,
File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
ValueError: Only support ndim=1, 2, or 3, got 4
In sigpy.block.arrat_to_blocks, the dim should be <=3 . Source code: (blk_shape (tuple): block shape of length ndim, with ndim={1, 2, 3}.) But the blk_shape dim you gave is 4 lead to this problem. Which dim should be deleted or something else. I have try my best to deal with it, but it doesn't work. May you give me some advice.
Hi, I believe it is a sigpy version mismatch! Maybe we can schedule a quick chat to address these issues? And I will update the repo accordingly. Apologize for the bugs, it's in a early development stage and thanks for your feedbacks. What would be the best way to contact you? Ke
Thanks you for replying. We can contact by email or github. And my email is l1i_fan@qq.com. You can email to me anytime.
Traceback (most recent call last): File "../train_ufloss.py", line 803, in
main(args)
File "../train_ufloss.py", line 562, in main
model_re.load_state_dict(
File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).
This is the error when running launch_training_MoDL_traditional_UFLoss_256_demo.sh. The model shape is not corresponding, so why? I can’t deal with it. And the other problem is in the file train_ufloss.py in line 193/194. if args.loss_normalized == False: output = output std + mean target = target std + mean Both the std and mean are not defined. What should I do?