Closed IvanGarcia7 closed 2 years ago
@IvanGarcia7 Hello. Did you solve this problem? I have encountered similar issue in video classification task
@IvanGarcia7 Hello. Did you solve this problem? I have encountered similar issue in video classification task
Good morning @birdman9391
Yes, I can fix the issue. However, I was applying the process described in the issue for object detection. In my case, the problem was due to the file required in the fine-tuning named label_map.txt . You have to make sure that the id of each of the classes that you want to detect when re-training starts from 1. For example, based on your classification issue if you want to detect person and car, the labelmap should look like this:
item { id: 1 name: 'person } item { id: 2 name: 'car' }
At least that solved my problem and the model after re-training detected the objects correctly.
Best regards. Iván García
Thanks @IvanGarcia7
So sad that my issue may comes from another thing T.T My model get accruracy about 30% in validation stage. But when I evaluate the model with the same script by restoring the checkpoint, it seems like giving a crashed result like 0% accuracy.
For example like this:
Since I'm using the exact same tfrecord for validation, I think I missed another thing ... But thanks for your kind reply :D Hope you have a nice day!
Hello again @birdman9391
Since your model detects objects, I would say that the problem comes at the time of defining the json to obtain the mAP.
I suppose you are using the COCO evaluator, so you have to be careful how the bbox is defined, being in this case as follows:
[xmin, ymin, xmax-xmin, ymax-ymin]
You can see an example of how I create these annotations in my repo:
https://github.com/IvanGarcia7/ALAF/blob/main/ALAF/DEMO.ipynb
Another problem that may be happening may be related to the id of the images. If you have used tools like CVAT, the id is assigned non-sequentially as I have seen in some of my projects, so you will need to define a dictionary that matches the id with the respective image. See in the repo I attached above the section "LOAD THE JSON WITH THE GT ANNOTATIONS".
I hope your problem is solved. Hope you have a nice day too!
Hello again @birdman9391
Since your model detects objects, I would say that the problem comes at the time of defining the json to obtain the mAP.
I suppose you are using the COCO evaluator, so you have to be careful how the bbox is defined, being in this case as follows:
[xmin, ymin, xmax-xmin, ymax-ymin]
You can see an example of how I create these annotations in my repo:
https://github.com/IvanGarcia7/ALAF/blob/main/ALAF/DEMO.ipynb
Another problem that may be happening may be related to the id of the images. If you have used tools like CVAT, the id is assigned non-sequentially as I have seen in some of my projects, so you will need to define a dictionary that matches the id with the respective image. See in the repo I attached above the section "LOAD THE JSON WITH THE GT ANNOTATIONS".
I hope your problem is solved. Hope you have a nice day too!
Hello @IvanGarcia7
My task is video classification task so I don't have label_map. I'm just worried that there might be some issue in saving the checkpoint file and hope you already found the reason. But our problem seems like coming from different issues.
And now I'm debugging the internal code in orbit.controller._maybe_save_checkpoint()
and I found that the 'model weights before saving the checkpoint' and 'model weights after restoring the checkpoint' have different values.
I don't know why but I hope I can fix the issue by handling this.
Thanks for your kind reply again :D
the similiar error , have you solved the error maybe not well,but i think the model get the convergence,once i got good inference on yolov5,so the dataset out not 2 contain bad data. if sloved ,please @ me ,tks
How tyo get the model accuracy and recall. As it is not possible to even get the plots of tensorboard too.
Prerequisites
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config
2. Describe the bug
I am trying to re-train EfficientDet D4, coming from Tensorflow Model Zoo (http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d4_coco17_tpu-32.tar.gz) on my dataset.
The configuration file I am using is the following:
When I make use of model_main_tf2 to start training, no error appears. However, when I check the model accuracy, it does not detect anything.
I try to modify parameters like learning rate, the number of epochs, etc but doesn't work
3. Steps to reproduce
To Fine-Tuning this model, I have followed the steps established in the following guide (https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html).
Tensorboard:
When I infer over one image, this is the detections['detection_boxes'] output:
tf.Tensor( [[[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]]], shape=(1, 100, 4), dtype=float32)
4. Expected behavior
Since I am re-training on existing classes in the pre-trained model, the MAP should not drop to 0, since both the learning rate and the number of steps are low. This error occurs with other models of the EfficientDet family. On the other hand, I have tested with another data set and the results are similar.
Following the same process but re-training other models such as CenterNet this problem does not appear.
5. System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 21.04 TensorFlow version (use command below): v2.8.0-rc1-32-g3f878cff5b6 2.8.0 Python version: 3.10.2 CUDA/cuDNN version: 11.6/8.3 GPU model and memory: Nvidia RTX 3080Ti 12GB