MrBelly commented 9 months ago

Dear Author,

I hope you are having a good day. I would like to ask a question related to pre-training. The for loop should be replaced by the lite mono encoder and the depth map with 192 by 640 should be given to the head, right? Thanks in advance

noahzn commented 9 months ago

Hi, you don't need to change the for loop. Please see this line, it should read your model, i.e., the encoder of Lite-Mono (args.model).

The encoder should output 1000 classes, which are the categories defined in ImageNet. For the pretraining you don't need any depth map. The pretraining is a 1000-classes classification task.

MrBelly commented 9 months ago

Dear Author,

Thanks for your reply. I replaced the linked line with

model = LiteMono(model="lite-mono", width=256, height=256, layer_scale_init_value=args.layer_scale_init_value) model.to(device)

Also I replaced the for loop with def forward_features(self, x): features = []

x = (x - 0.45) / 0.225

    x_down = []
    for i in range(4):
        x_down.append(self.input_downsample[i](x))

    tmp_x = []
    x = self.downsample_layers[0](x)
    x = self.stem2(torch.cat((x, x_down[0]), dim=1))
    tmp_x.append(x)

    for s in range(len(self.stages[0])-1):
        x = self.stages[0][s](x)
    x = self.stages[0][-1](x)
    tmp_x.append(x)
    features.append(x)

    for i in range(1, 3):
        tmp_x.append(x_down[i])
        x = torch.cat(tmp_x, dim=1)
        x = self.downsample_layers[i](x)

        tmp_x = [x]
        for s in range(len(self.stages[i]) - 1):
            x = self.stages[i][s](x)
        x = self.stages[i][-1](x)
        tmp_x.append(x)

    return self.norm(x.mean([-2, -1]))

I used single GPU and I can train the model with this pre-trained checkpoint. My problem is currently training the model with multi GPU s on imagenet. I having an error as below

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

noahzn commented 9 months ago

Hi, if there are some variants defined in the network class but never used, please delete them. For example, you may have defined self.head=xxx but never used it. Then just delete this line.

MrBelly commented 9 months ago

Thanks for your reply. I have solved my issue by replacing the loss term in the engine.py with loss = loss + 0 * sum(p.sum() for p in model.parameters()) which uses all the parameters. It causes some latency but I think, it is not that significant. Thanks for your time and advice.

noahzn commented 9 months ago

Good that you solved this!

noahzn / Lite-Mono

About the pre-training #112

x = (x - 0.45) / 0.225