zxcqlf / MonoViT

Self-supervised monocular depth estimation with a vision transformer
MIT License
157 stars 18 forks source link

Unable to reproduce MPViT-base correctly. #14

Open CeciliaYao opened 1 year ago

CeciliaYao commented 1 year ago

Dear author,

Thank you for your fantastic contribution ! However,I had some problems reproducing the results of MPViT-base. I'd really appreciate it if you would help me check what the problem is :-) @zxcqlf

I eval my MPViT-base model on KITTI and got the following results: 截屏2023-04-04 15 31 54

I think it may be because I set num_ch_enc or ch_enc incorrectly in depth_decoder, would you help me confirm what the correct value should be?

  1. Firstly, I modified the DepthDecoder class in hr_decoder.py by changing self.num_ch_dec to np.array([64, 64, 128, 256, 512]) as shown below.
class DepthDecoder(nn.Module):
    def __init__(self, ch_enc = [64,128,216,288,288], scales=range(4),num_ch_enc = [ 64, 64, 128, 256, 512 ], num_output_channels=1):
        super(DepthDecoder, self).__init__()
        self.num_output_channels = num_output_channels
        self.num_ch_enc = num_ch_enc
        self.ch_enc = ch_enc
        self.scales = scales
        # self.num_ch_dec = np.array([16, 32, 64, 128, 256])  # mpvit_small
        self.num_ch_dec = np.array([64, 64, 128, 256, 512])  # mpvit_base
  1. Secondly, in trainer.py, I reassigned the ch_enc and num_ch_enc arguments to DepthDecoder. It looks like this:

    class Trainer:
    def __init__(self, options, ngpus_per_node=None):
        ... ...
        self.models["encoder"] = networks.mpvit_base()
        self.models["encoder"].to(self.device)
        # self.parameters_to_train += list(self.models["encoder"].parameters())
    
        self.models["depth"] = networks.DepthDecoder(ch_enc=[128, 224, 368, 480, 480], num_ch_enc = [128,128,256,512,1024])
        self.models["depth"].to(self.device)
        self.parameters_to_train += list(self.models["depth"].parameters())
        ... ...
  2. Finally, in evaluate_depth.py, I changed the parameters of the encoder and decoder.

    def evaluate(opt,ngpus_per_node=None):
        ... ...
        encoder = networks.mpvit_base().to(device) #networks.ResnetEncoder(opt.num_layers, False)
        encoder.num_ch_enc = [128, 224, 368, 480, 480] # = networks.ResnetEncoder(opt.num_layers, False)
    
        depth_decoder = networks.DepthDecoder(ch_enc=[128,224,368,480,480], num_ch_enc = [128,128,256,512,1024]).to(device)
        ... ...

As a supplement, my training loss looks like this:

截屏2023-04-04 15 55 07

Thank you for your time and assistance!

CeciliaYao commented 1 year ago

Or is it because the checkpoint of mpvit_base,pth is currently provided? Would you release this ckpt model to us for learning and training? Thank you so much!

reolILoveYou commented 1 year ago

hello, classmate, could you please send me a copy of the complete code? I have been studying this paper recently, and my email is 1131806140@qq.com

reolILoveYou commented 1 year ago

thank you !!!!!!

zxcqlf commented 1 year ago

So do you test the suggested network ?(mpvit_small) For the mpvit_base part, you can only define an attention layer like: self.convs["f0"] = Attention_Module(self.ch_enc[0] , num_ch_enc[0]) and process the feature[0] as: feat[0] = self.convs["f0"](input_features[0]) Do not need to change the param. of num_ch_enc;

wasup07 commented 5 months ago

thank you !!!!!!

Could you send me the code? I'm having problems adjusting the train.py and trainer.py files. my mail is: amin.ayechi.2001@gmail.com