tensorflow / models

Models and examples built with TensorFlow
Other
77k stars 45.79k forks source link

training stops proceeding #11012

Open ftnabil97 opened 1 year ago

ftnabil97 commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

I used ssd_resnet50_v1_fpn_640x640_coco17_tpu model for training, but the training stops and gives the following instructions. Use fn_output_signature instead I0604 08:48:51.263917 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:02.712822 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:10.045338 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:20.309416 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)]

3. Steps to reproduce

1, installing tensorflow

  1. cloning github model
  2. protoc
  3. cocoapi
  4. compile
  5. object detection api steps
  6. training model extract
  7. .pbtxt, .record file creation
  8. pipeline.config update
  9. run training

4. Expected behavior

A successfully trained api

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

matheusschreiber commented 1 year ago

I'm facing this problem too, apparently its a RAM limitation. In my case i'm using COLAB free tier, and it just started working when i lowed to batch_size 32 in the config file for the model.

M1Z8N commented 1 year ago

Hey, yea I had to drop my batch_size from 64 to 4, for my model to start training! I am probably going to upgrade to COLAB Pro soon!