Open di-jabil opened 5 years ago
@di-jabil
Thanks for the summary! Great effort on root-causing.
There are two possible reasons I can think of 1) platform dependent behavior, you are exporting on MacOs and compile on Ubuntu? Could you try exporting and compile both on Ubuntu? 2) the graph definition is wrong, could be a recent change on TensorFlow Object Det API. Output names are called concat, and concat1. Do you still have your tensorboard?
@weiranzhao
Thanks for the reply. Please see my answers below
1, I was going to try exporting on Ubuntu and it is still on my TODO list. May give it a try after the immediate project is done. Will keep you updated
2, My graph from the tensorboard is attached here. Do you think it helps?
Much appreciate your involvement. Looking forward to your future instructions.
I just ran into this same issue, and I am using solely Ubuntu 18.04. I thought something was strange because everyone said that line 84:
assert len(logit_scores) == 4 * _NUM_ANCHORS
had to have the constant updated to the number of classes to be trained (5 in my case), but the script was asserting for box_encodings in the line that followed, so i updated that instead. It worked then but my detections were then all messed up (100s of bogus ones). Reverted the box encodings line, and set logit_scores back to 5 * _NUM_ANCHORS, but swapped them in the call to _decode_detection_result like you did, and it started working perfectly! Can't thank you enough for this catch!
Hi All,
As you may have seen in the issue #563, I have been struggling to load a retrained object detection model to the AIY kit. After two days of struggling, I think I have found something really interesting:
(Please refer to #563 for more background information. Here I would like to have cleaned up summary)
First off: I am using a MacBook with macOS 10.12.6 for most of the operations described below, except for the compiling. The compiler is on a virtual machine Ubuntu 64bit 18.04.1. The AIY image is AIY Kits Release 2018-11-16
Initial Scenario: I am following the tutorial on the AIY homepage to create a custom object detection project.
Some key points:
The training went well. The exporting went well too. I was able to run local evaluations with the exported graph (the local evaluations were on the Mac too).
I then moved the frozen graph to the Ubuntu virtual machine. The compiling had no problem.
Then I used scp to load the compiled binaryproto to the AIY hardware. First tried the any_model_camera.py and had no problem.
Then moved the binaryproto file to /opt/aiy/models/ and modified the ~/AIY-projects-python/src/aiy/vision/models/object_detection.py to load this binaryproto and to reflect the new number of labels (2, instead of 4)
Please refer to #563 for more details on the code changes.
Basically following the instructions in this blog https://cogint.ai/custom-vision-training-on-the-aiy-vision-kit/
Then I ran the object detection demo and received the AssertionError reported in #563: Traceback (most recent call last): File "/home/pi/AIY-projects-python/src/examples/vision/object_detection.py", line 73, in main() File "/home/pi/AIY-projects-python/src/examples/vision/object_detection.py", line 59, in main objects = object_detection.get_objects(result, args.threshold, offset) File "/opt/aiy/projects-python/src/aiy/vision/models/object_detection.py", line 269, in get_objects objs = _decode_detection_result(logit_scores, box_encodings, threshold, size, offset) File "/opt/aiy/projects-python/src/aiy/vision/models/object_detection.py", line 88, in _decode_detection_result assert len(logit_scores) == _NUM_LABELS * _NUM_ANCHORS AssertionError
Error Basically, the number of anchors is always 1278 (loaded from the txt file). According to the blog https://cogint.ai/custom-vision-training-on-the-aiy-vision-kit/, the number of logit_scores is supposed to be the number of labels times the number of anchors, and the number of box_encodings should remain the same as 4 times the number of anchors.
So supposedly, the numbers should have been: number of anchors = 1278 number of logit_scores = 1278 2 (2 labels, target and background ) = 2556 number of box_encodings = 1278 4 = 5112
But my print statement shows that in my case the number of logit_scores and the number of box_encodings got reversed: number of anchors = 1278 number of logit_scores = 5112 number of box_encodings = 2556
What I Did Then Because only little was changed on the AIY side, I was suspecting that something wrong with my model. I saw the whole process as train->export->compile->run. Therefore I planned to try other models at various stages of the process and see if I could figure out where the broken link was.
I tried several other binaryprotos, frozen graphs, and checkpoints:
From my experiments, all the models that exported by myself had the problem, while all the models that NOT exported by myself had no problem. So seems like the exporting is the culprit. But why and how? I couldn't figure out. I was the tutorial here to export https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
Solution I couldn't figure out what went wrong with the exporting. I suspect that it is related to the operating system. But this is so beyond my capability.
Before I gave up, I wanted to give it a try. So I reversed the order of logit_scores and box_encodings at line 266 of aiyprojects/src/aiy/vision/models/object_detection.py:
#objs = _decode_detection_result(logit_scores, box_encodings, threshold, size, offset)
objs = _decode_detection_result( box_encodings, logit_scores, threshold, size, offset)
Then it worked. The demo returned "Object #0: kind=TARGET(1), score=0.926968, bbox=(223, 186, 113, 115)" and this result
And i tested a few more and all worked.
Thanks As I said, figuring out an explanation for what happened is beyond me at this point. So I just wanted to share this experience with you all. Hopefully it can help someone a bit and raise the attention for this issue.
Thank you for reading this far. Let me know if you have any questions.