Comparison Experiment - Githubissues

Ystartff commented 4 months ago

I found that running some of your comparison experiments revealed a few problems： Take running META_Unet as an example.May I ask what the problem is and if you can help?

../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [123,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [124,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [125,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [126,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [127,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "train.py", line 224, in main(config) File "train.py", line 167, in main train_one_epoch( File "/mnt/data/linda/yyf/H-vmunet-main/engine.py", line 41, in train_one_epoch loss.backward() File "/home/linda/anaconda3/envs/vmunet/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/linda/anaconda3/envs/vmunet/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = True torch.backends.cudnn.allow_tf32 = True data = torch.randn([8, 32, 256, 256], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(32, 1, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()

ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_FLOAT padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = true allow_tf32 = true input: TensorDescriptor 0x89e818a0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 8, 32, 256, 256, strideA = 2097152, 65536, 256, 1, output: TensorDescriptor 0x89efcbc0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 8, 1, 256, 256, strideA = 65536, 65536, 256, 1, weight: FilterDescriptor 0x7fa474042520 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 1, 32, 3, 3, output: 0x7fa517d00000 weight: 0x7fa5809fa400

wurenkai commented 4 months ago

Hi, according to your error message, if it is a binary segmentation, you should check the output of META_UNet to see if the activation function of Sigmoid is added.

Ystartff commented 4 months ago

Author your help is very effective for me, thank you for your help

Ystartff commented 4 months ago

Hello, I'm here again and I noticed that your experiment is set up
print_interval = 20 val_interval = 30 save_interval = 100 I find that you save the best weighting results with the smallest loss, but I find that there are actually better weighting results during training, and your experiments will not take the highest values because of this. Did you modify these parameters to save the weights one by one to find the maximum value?

wurenkai commented 4 months ago

Hi, we did not modify the above parameters. Our experiments were tested by taking the lowest loss obtained by performing val when training each epoch.

Ystartff commented 4 months ago

Hi!author, your work is excellent, one question I have is why you use a standard convolution with a convolution kernel of 3 as a layer at the very beginning of the encoder and before predicting the resultant outputs

wurenkai commented 4 months ago

Hi, in H-vmunet, this is to increase the initial number of channels, making it possible to have a sufficient number of channels for high-order interactions when using H-SS2D subsequently.

Ystartff commented 4 months ago

Then in the last two layers of the final decode output

wurenkai / H-vmunet

Comparison Experiment #5