Now , when I run darknet detctor demo it is working fine and obtaining the expected output. However after converting the weights in to IR using intel openvino the results are extremely weird. I guess I made some mistake in conversion.
Step1 : weights to pb conversion
(using https://github.com/mystic123/tensorflow-yolo-v3 )
The one with default cfg and original weights. : frozen_darknet_yolov3_model.xml
One with modified weights and cfg as per your suggestion : yolo_edited_test_20000.xml
Hi @mystic123 :
I trained yolo V3 using your repo with the below cfg file.
[net]
Testing
batch=1
subdivisions=1
Training
batch=64 subdivisions=16 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.001 burn_in=1000 max_batches = 500200 policy=steps steps=400000,450000 scales=.1,.1
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
Downsample
[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample
[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample
[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample
[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample
[convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
######################
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear
[yolo] mask = 6,7,8 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=1 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 61
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear
[yolo] mask = 3,4,5 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=1 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 36
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear
[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=1 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
Now , when I run darknet detctor demo it is working fine and obtaining the expected output. However after converting the weights in to IR using intel openvino the results are extremely weird. I guess I made some mistake in conversion. Step1 : weights to pb conversion (using https://github.com/mystic123/tensorflow-yolo-v3 )
python3 /home/paperspace/Desktop/tensorflow-yolo-v3/convert_weights_pb.py --class_names /home/paperspace/Downloads/Dataset/metadata/person.names --data_format NHWC --weights_file /home/paperspace/Downloads/Dataset/metadata/yolo_backup/yolov3_edited_14700.weights --output_graph /home/paperspace/Desktop/yolo_edited_test_14700.pb --size 416
And Now PB to XML
_
python3 /opt/intel/computer_vision_sdk_2018.5.455/deployment_tools/model_optimizer/mo_tf.py --input_model /home/paperspace/Desktop/yolo_edited_test_14700.pb --tensorflow_object_detection_api_pipeline_config /home/paperspace/Downloads/Dataset/metadata/yolov3_edited.cfg --tensorflow_use_custom_operations_config /opt/intel/computer_vision_sdk_2018.5.455/deployment_tools/model_optimizer/extensions/front/tf/yolo_v3_edited.json --reverse_input_channels --input_shape [1,416,416,3] --data_type FP16 --output_dir /home/paperspace/Desktop/14700/_
Can you suggest the head way ? Here is the output.
While converting IR files , total 3 files are generated .
.xml , .bin , .mapping Here is the comparison of .mapping file . Left sided file is with default configuration (https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3.cfg) and original weights. (https://pjreddie.com/media/files/yolov3.weights)
And here are the xml files .
The one with default cfg and original weights. : frozen_darknet_yolov3_model.xml One with modified weights and cfg as per your suggestion : yolo_edited_test_20000.xml
https://github.com/AlexeyAB/darknet/files/3119981/xml.zip
And in the conversion process one needs to add .json file of yolo. Here is the yolov3.json (which I have used)
[ { "id": "TFYOLOV3", "match_kind": "general", "custom_attributes": { "classes": 1, "coords": 4, "num": 9, "mask": [0, 1, 2], "entry_points": ["detector/yolo-v3/Reshape", "detector/yolo-v3/Reshape_4", "detector/yolo-v3/Reshape_8"] } } ]