rwightman / efficientdet-pytorch

A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights
Apache License 2.0
1.58k stars 293 forks source link

Another forward() takes 3 positional arguments question #106

Closed klock18 closed 4 years ago

klock18 commented 4 years ago

Hi!

I am having the same issue as: https://github.com/rwightman/efficientdet-pytorch/issues/42

I tried your fix but don't think i did it in the correct way:

        boxes = [target['boxes'].to(self.device).float() for target in targets]
        labels = [target['labels'].to(self.device).float() for target in targets]

        combined =  {"bbox": boxes, "cls": labels}
        loss, _, _ = self.model(images, combined)

Because I am getting this error: Traceback (most recent call last): File "train_baseline.py", line 473, in run_training() File "train_baseline.py", line 456, in run_training fitter.fit(train_loader, val_loader) File "train_baseline.py", line 294, in fit summary_loss = self.train_one_epoch(train_loader) File "train_baseline.py", line 363, in train_one_epoch with amp.scale_loss(loss, self.optimizer) as scaled_loss: File "/home/loc103/anaconda3/envs/stac/lib/python3.7/contextlib.py", line 112, in enter return next(self.gen) File "/home/loc103/anaconda3/envs/stac/lib/python3.7/site-packages/apex/amp/handle.py", line 113, in scale_loss yield (loss.float())*loss_scale AttributeError: 'str' object has no attribute 'float'

I appreciate any help you could give me!

klock18 commented 4 years ago

UPDATE: I found this correction on Kaggle but now I can only get through ~7 training steps before CUDA runs out of memory, any advice for that?

        target_res = {}

        boxes = [target['boxes'].to(self.device).float() for target in targets]
        labels = [target['labels'].to(self.device).float() for target in targets]

        target_res['bbox'] = boxes
        target_res['cls'] = labels 

        self.optimizer.zero_grad()
        # 
        # targets

        outputs = self.model(images, target_res)
        loss = outputs['loss']

        loss.backward()

        # with amp.scale_loss(loss, self.optimizer) as scaled_loss:
        #     scaled_loss.backward()        

        summary_loss.update(loss.detach().item(), batch_size)
rwightman commented 4 years ago

@klock18 training these networks is quite GPU memory intensive, and the memory consumption takes a number of steps to stabilize. Only advice I have for that is definitely don't disable AMP as it'll roughly half the memory use, and then reduce the batch size or possibly lower the resolution of your model config.