smhassanerfani / atlantis

A Benchmark for Semantic Segmentation of Waterbody Images
38 stars 11 forks source link

Problem with pre-trained model of AQUAnet #12

Closed PraviMyl closed 1 year ago

PraviMyl commented 2 years ago

I want to test the AQUAnet model in my working area. I had an error in this line: RESTORE_FROM = '../../atlantis_file/snapshots/'+NAME+'/epoch29.pth' So can you please share your pre-trained model of AQUAnet (trained on your Atlantis dataset)? My email is pravina.m@eng.pdn.ac.lk

smhassanerfani commented 2 years ago

Hi Pravina, give me some time, I'll upload all .pth files on google drive and make it publicly available to download.

thank you Mohammad

PraviMyl commented 2 years ago

Thank you for your concern.

smhassanerfani commented 2 years ago

Pravina here is the link for all models trained in this study. https://drive.google.com/drive/folders/1UqkXVoeCK6bEZ51YmXr1zdFjH5kO9foP?usp=sharing wish you luck!

PraviMyl commented 2 years ago

I'm grateful to you, Mohammad Erfani!

PraviMyl commented 2 years ago

When I tested with your AQUAnet model with trained weights, I got the error:

     RuntimeError: Error(s) in loading state_dict for Aquanet:
      Missing key(s) in state_dict: "backbone.conv1.weight", "backbone.bn1.weight",.. 
       size mismatch for context1.2.context.0.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from 
               checkpoint, the shape in current model is torch.Size([256, 256, 3, 3])"

I think the model you uploaded on your GitHub is unsuitable for the trained weights you uploaded on google drive. If my guess is correct, can you please send me the AQUAnet model that can fit with the trained weights? (epoch29.pth)

smhassanerfani commented 2 years ago

The following snippet will help you how to load the model:

from aquanet import Aquanet
import torch
import torch.nn as nn

model = Aquanet(num_classes=56)
model_dict = model.state_dict()

saved_state_dict = torch.load("epoch29.pth")

for key, value in model_dict.items():
    key_parts_lst = key.split(".")
    if key_parts_lst[0] in ["backbone"]:
        model_dict[key] = saved_state_dict[".".join(key_parts_lst[1:])]
        # print(f"{key}: {model_dict[key].shape}, {saved_state_dict['.'.join(key_parts_lst[1:])].shape}")
    else:
        model_dict[key] = saved_state_dict[key]
        # print(f"{key}: {model_dict[key].shape}, {saved_state_dict[key].shape}")

if you uncomment the print commands it will show you the dimension of each weight matrix on model and saved state dictionary. They have to be the same. I put some outputs below.

Good luck!

backbone.conv1.weight: torch.Size([64, 3, 3, 3]), torch.Size([64, 3, 3, 3])
backbone.bn1.weight: torch.Size([64]), torch.Size([64])
backbone.bn1.bias: torch.Size([64]), torch.Size([64])
backbone.bn1.running_mean: torch.Size([64]), torch.Size([64])
backbone.bn1.running_var: torch.Size([64]), torch.Size([64])
backbone.bn1.num_batches_tracked: torch.Size([]), torch.Size([])
backbone.conv2.weight: torch.Size([64, 64, 3, 3]), torch.Size([64, 64, 3, 3])
backbone.bn2.weight: torch.Size([64]), torch.Size([64])
backbone.bn2.bias: torch.Size([64]), torch.Size([64])
backbone.bn2.running_mean: torch.Size([64]), torch.Size([64])
backbone.bn2.running_var: torch.Size([64]), torch.Size([64])
backbone.bn2.num_batches_tracked: torch.Size([]), torch.Size([])
backbone.conv3.weight: torch.Size([128, 64, 3, 3]), torch.Size([128, 64, 3, 3])
.
.
.
PraviMyl commented 2 years ago

Now the code works well. But I got the wrong results. When I save the outputs, it hasn't any predictions (the predicted image is fully in black color). Can I know the reason for that? I have attached the screenshot of the outputs for your reference

Screenshot 2022-09-28 at 11 08 57
smhassanerfani commented 2 years ago

Hi So sorry for late response, let me check to find what the problem is. first thing comes to my mind is wrong trained weights for model. I'll let you know whenever I find the problem.

PraviMyl commented 2 years ago

Hi, Thanks for your support. I tried several times by changing test.py. But I got blank output as before. The summary of the model:

Screenshot 2022-10-04 at 06 32 21
smhassanerfani commented 2 years ago

Hi, Sorry for such a late response. I retrain AQUANet on ATLANTIS and update the .pth file in Google drive. The link is: https://drive.google.com/file/d/1fSg3UOh2qBj6AfLKtJFVj2PhbVNsKjfg/view?usp=share_link I also modified the train.py and made it compatible with windows.

Good Luck!

PraviMyl commented 2 years ago

Hello Mohammad Erfani,

I am really grateful to you and thank you for your support. I hope this will work

Thanks again

On Thu, Nov 3, 2022 at 6:22 PM Mohammad Erfani @.***> wrote:

Hi, Sorry for such a late response. I retrain AQUANet on ATLANTIS and update the .pth file in Google drive. The link is:

https://drive.google.com/file/d/1fSg3UOh2qBj6AfLKtJFVj2PhbVNsKjfg/view?usp=share_link I also modified the train.py and made it compatible with windows.

Good Luck!

— Reply to this email directly, view it on GitHub https://github.com/smhassanerfani/atlantis/issues/12#issuecomment-1302051819, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2PUWPWBA233WWE5VVDNZC3WGOYPFANCNFSM6AAAAAAQSVCEKQ . You are receiving this because you modified the open/close state.Message ID: @.***>

gppcaputo commented 1 year ago

i'm trying to perform the test using the weights of DeepLabV3. I follow the snippet that you provided in this issue, but i got the same problems. Can you please help me ? Maybe also the deeplabv3 .pth file is not correct?