Another forward() takes 3 positional arguments question

klock18 commented 4 years ago

Hi!

I am having the same issue as: https://github.com/rwightman/efficientdet-pytorch/issues/42

I tried your fix but don't think i did it in the correct way:

        boxes = [target['boxes'].to(self.device).float() for target in targets]
        labels = [target['labels'].to(self.device).float() for target in targets]

        combined =  {"bbox": boxes, "cls": labels}
        loss, _, _ = self.model(images, combined)

Because I am getting this error: Traceback (most recent call last): File "train_baseline.py", line 473, in run_training() File "train_baseline.py", line 456, in run_training fitter.fit(train_loader, val_loader) File "train_baseline.py", line 294, in fit summary_loss = self.train_one_epoch(train_loader) File "train_baseline.py", line 363, in train_one_epoch with amp.scale_loss(loss, self.optimizer) as scaled_loss: File "/home/loc103/anaconda3/envs/stac/lib/python3.7/contextlib.py", line 112, in enter return next(self.gen) File "/home/loc103/anaconda3/envs/stac/lib/python3.7/site-packages/apex/amp/handle.py", line 113, in scale_loss yield (loss.float())*loss_scale AttributeError: 'str' object has no attribute 'float'

I appreciate any help you could give me!

klock18 commented 4 years ago

UPDATE: I found this correction on Kaggle but now I can only get through ~7 training steps before CUDA runs out of memory, any advice for that?

        target_res = {}

        boxes = [target['boxes'].to(self.device).float() for target in targets]
        labels = [target['labels'].to(self.device).float() for target in targets]

        target_res['bbox'] = boxes
        target_res['cls'] = labels 

        self.optimizer.zero_grad()
        # 
        # targets

        outputs = self.model(images, target_res)
        loss = outputs['loss']

        loss.backward()

        # with amp.scale_loss(loss, self.optimizer) as scaled_loss:
        #     scaled_loss.backward()        

        summary_loss.update(loss.detach().item(), batch_size)

rwightman commented 4 years ago

@klock18 training these networks is quite GPU memory intensive, and the memory consumption takes a number of steps to stabilize. Only advice I have for that is definitely don't disable AMP as it'll roughly half the memory use, and then reduce the batch size or possibly lower the resolution of your model config.

rwightman / efficientdet-pytorch

Another forward() takes 3 positional arguments question #106