tensorflow / models

Models and examples built with TensorFlow
Other
77.25k stars 45.75k forks source link

GPU Out of memory error from commit 451906e4e82f19712455066c1b27e2a6ba71b1dd #8601

Open swapniel99 opened 4 years ago

swapniel99 commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main.py

https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_resnet101_coco.config

2. Describe the bug

I tried doing transfer learning on Faster-RCNN-resnet101-coco. Execution reaches till Step 0 and then crashes due to out of memory. This has started after below commit: 451906e4e82f19712455066c1b27e2a6ba71b1dd All commits before this didn't give this error. Looks like some issue with tfslim.

3. Steps to reproduce

Checkout later master branch. Attempt transfer learning on Faster-RCNN-Resnet101-COCO.

4. Expected behavior

Transfer learning steps happening successfully.

5. Additional context

Out of memory error.

6. System information

Environment: tf_env.txt

tmlabonte commented 4 years ago

I'm having the same issue, an OOM error while transfer learning from Faster-RCNN-resnet101-coco using TFv1.15 and Python 3.6 on a Colab GPU. I rolled back to commit 73b5be67f8b9b70b46c5cfb7b6b69b0106b1b94c and it worked.