Open NAEE09 opened 4 years ago
In addition, as I mentioned in the issue I did this example https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb (not in colab), and it worked, and I try to do the same with my own data and the Mask R-CNN Inception ResNet V2 1024x1024 model, and the same error appears when I convert the dataset into an iterator. I can load the configuration of the model, build it, but in this part the error appears and I am limiting the memory at the beginning of the program.
train_input = inputs.train_input( train_config=train_config, train_input_config=train_input_config, model_config=model_config, model=detection_model)
train_input = train_input.repeat()
input_iter = iter(train_input) features, labels = next(input_iter)
Please take a look at this issue here and let me know if it helps. Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
@gowthamkpr I've looked that issue, but the problem is I have another config file and I can't change the parameters that they recommend. I solved the problem reducing the image size to 300x600 and mask size 35x35 in the config model, but I have pretty bad results. Any advise?or how can I optimize the memory usage as in the issue you mentioned?
``
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py
2. Describe the bug
I want to train the model Mask R-CNN Inception ResNet V2 1024x1024, I have my dataset coverted to .record file, the pipeline model is configured, and the GPU works with other training models. I tried to limit the GPU memory (also works in other training models) but the error still appears.
Error:
3. Steps to reproduce
from ~/models/research
python object_detection/model_main_tf2.py --pipeline_config_path=/home/robotronics/Projects/blm_Mask_RCNN/model_MaskRCNN/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8/model.config --model_dir=/home/robotronics/Projects/blm_Mask_RCNN/blm/models/model --num_train_steps=5000 --sample_1_of_n_eval_examples=10 --alsologstostderr
4. Expected behavior
Complete training model
5. Additional context
I try to limit the memory in the model_main_tf2.py and model_lib_v2.py
I did the examples of the documentation https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/auto_examples/plot_object_detection_checkpoint.html and also work.
6. System information