Closed MrBelly closed 9 months ago
Hi, you don't need to change the for loop. Please see this line, it should read your model, i.e., the encoder of Lite-Mono (args.model
).
The encoder should output 1000 classes, which are the categories defined in ImageNet. For the pretraining you don't need any depth map. The pretraining is a 1000-classes classification task.
Dear Author,
Thanks for your reply. I replaced the linked line with
model = LiteMono(model="lite-mono", width=256, height=256, layer_scale_init_value=args.layer_scale_init_value) model.to(device)
Also I replaced the for loop with def forward_features(self, x): features = []
x_down = []
for i in range(4):
x_down.append(self.input_downsample[i](x))
tmp_x = []
x = self.downsample_layers[0](x)
x = self.stem2(torch.cat((x, x_down[0]), dim=1))
tmp_x.append(x)
for s in range(len(self.stages[0])-1):
x = self.stages[0][s](x)
x = self.stages[0][-1](x)
tmp_x.append(x)
features.append(x)
for i in range(1, 3):
tmp_x.append(x_down[i])
x = torch.cat(tmp_x, dim=1)
x = self.downsample_layers[i](x)
tmp_x = [x]
for s in range(len(self.stages[i]) - 1):
x = self.stages[i][s](x)
x = self.stages[i][-1](x)
tmp_x.append(x)
return self.norm(x.mean([-2, -1]))
I used single GPU and I can train the model with this pre-trained checkpoint. My problem is currently training the model with multi GPU s on imagenet. I having an error as below
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
; (2) making sure all forward
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward
function. Please include the loss function and the structure of the return value of forward
of your module when reporting this issue (e.g. list, dict, iterable).
Hi, if there are some variants defined in the network class but never used, please delete them. For example, you may have defined self.head=xxx
but never used it. Then just delete this line.
Thanks for your reply. I have solved my issue by replacing the loss term in the engine.py with loss = loss + 0 * sum(p.sum() for p in model.parameters()) which uses all the parameters. It causes some latency but I think, it is not that significant. Thanks for your time and advice.
Good that you solved this!
Dear Author,
I hope you are having a good day. I would like to ask a question related to pre-training. The for loop should be replaced by the lite mono encoder and the depth map with 192 by 640 should be given to the head, right? Thanks in advance