ftnabil97 commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[y] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[y] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[y] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

I used ssd_resnet50_v1_fpn_640x640_coco17_tpu model for training, but the training stops and gives the following instructions. Use fn_output_signature instead I0604 08:48:51.263917 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:02.712822 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:10.045338 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:20.309416 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)]

3. Steps to reproduce

1, installing tensorflow

cloning github model
protoc
cocoapi
compile
object detection api steps
training model extract
.pbtxt, .record file creation
pipeline.config update
run training

4. Expected behavior

A successfully trained api

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
Mobile device name if the issue happens on a mobile device:
TensorFlow installed from (source or binary):
TensorFlow version (use command below):2.12.0
Python version:3
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:no gpu. ran it from colab

matheusschreiber commented 1 year ago

I'm facing this problem too, apparently its a RAM limitation. In my case i'm using COLAB free tier, and it just started working when i lowed to batch_size 32 in the config file for the model.

M1Z8N commented 1 year ago

Hey, yea I had to drop my batch_size from 64 to 4, for my model to start training! I am probably going to upgrade to COLAB Pro soon!

tensorflow / models

training stops proceeding #11012