モデル作成 - Githubissues

wataru129 commented 5 years ago

自作データセットからYOLOモデル作成

wataru129 commented 5 years ago

pytorchを用いてYOLOv3を学習しているものがあまりない (一応あったがエラーを吐くhttps://github.com/andy-yun/pytorch-0.4-yolov3) のでとりあえずライブラリを用いていないdarketで学習させてみる．https://qiita.com/harmegiddo/items/c3db5fd567fa4c6cc9fb#3-training--customize-using-my-dataset

wataru129 commented 5 years ago

エラーを結構吐くらしい https://qiita.com/miyamotok0105/items/6d2797e4a76ed642178b ・アノテーションファイルは絶対パスに直す・エンコーディングはUTF-8

wataru129 commented 5 years ago

(SAP) Wataru-2:darknet wataru$ ./darknet detector train ./test_output/data/obj.data ./test_output/yolo-obj.cfg yolov3.weights yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 25 route 16 26 reorg / 2 26 x 26 x 512 -> 13 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 13 x 13 x3072 -> 13 x 13 x1024 9.569 BFLOPs 29 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35 0.012 BFLOPs 30 detection mask_scale: Using default '1.000000' Loading weights from yolov3.weights...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Saving weights to /Users/wataru/SAP/darknet/test_output/backup/yolo-obj_final.weights (SAP) Wataru-2:darknet wataru$ ./darknet detector test ./test_output/data/obj.data ./test_output/yolo-obj.cfg ./test_output/backup/yolo-obj_final.weights ./test_output/data/obj/b1.jpg -thresh 0.01 layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 25 route 16 26 reorg / 2 26 x 26 x 512 -> 13 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 13 x 13 x3072 -> 13 x 13 x1024 9.569 BFLOPs 29 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35 0.012 BFLOPs 30 detection mask_scale: Using default '1.000000' Loading weights from ./test_output/backup/yolo-obj_final.weights...Done! ./test_output/data/obj/b1.jpg: Predicted in 3.842769 seconds.

wataru129 commented 5 years ago

時間がものすごくかかったので一回学習を止めてpredictしてみましたが検出されませんでした．今日の夜に学習を回して起きたいと思います．それで検出できなかった場合は，データセットの追加を検討します．よろしくお願いします．

wataru129 commented 5 years ago

学習のログです． (SAP) Wataru-2:darknet wataru$ ./darknet detector train ./test_output/data/obj.data ./test_output/yolo-obj.cfg yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 25 route 16 26 reorg / 2 26 x 26 x 512 -> 13 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 13 x 13 x3072 -> 13 x 13 x1024 9.569 BFLOPs 29 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35 0.012 BFLOPs 30 detection mask_scale: Using default '1.000000' Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Loaded: 0.350093 seconds Region Avg IOU: 0.242288, Class: 0.633528, Obj: 0.298991, No Obj: 0.437732, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.170173, Class: 0.623199, Obj: 0.319716, No Obj: 0.440035, Avg Recall: 0.125000, count: 8 Region Avg IOU: 0.108932, Class: 0.326759, Obj: 0.328702, No Obj: 0.435478, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.179150, Class: 0.327494, Obj: 0.188378, No Obj: 0.438899, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.095722, Class: 0.531254, Obj: 0.220434, No Obj: 0.438162, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.174627, Class: 0.461948, Obj: 0.229556, No Obj: 0.436914, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.194934, Class: 0.710932, Obj: 0.213355, No Obj: 0.438075, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.115228, Class: 0.628432, Obj: 0.239744, No Obj: 0.436616, Avg Recall: 0.000000, count: 8 1: 215.828751, 215.828751 avg, 0.000100 rate, 809.741505 seconds, 64 images Loaded: 0.000031 seconds

wataru129 commented 5 years ago

darknetのトレーニング成功事例環境は別として今回やりたいことに一番近い． http://hakkentanoshii.seesaa.net/article/450649219.html

minamocake commented 5 years ago

・model training環境について

model training 環境としてはMovidiusはIntel担当の人に先方に提供してもらえないか聞いてもらってる
SAP CP上でtraining出来るかもしれないのでYOLO darknetが対応しているか確認僕の方でするね！

wataru129 commented 5 years ago

重みの初期値を与えた結果の出力です． (SAP) Wataru-2:darknet wataru$ ./darknet detector train ./test_output/data/obj.data ./test_output/yolo-obj.cfg darknet53.conv.74 yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs 25 route 16 26 reorg / 2 26 x 26 x 512 -> 13 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 13 x 13 x3072 -> 13 x 13 x1024 9.569 BFLOPs 29 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35 0.012 BFLOPs 30 detection mask_scale: Using default '1.000000' Loading weights from darknet53.conv.74...Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Loaded: 0.373610 seconds Region Avg IOU: 0.422904, Class: 0.500224, Obj: 0.499002, No Obj: 0.500009, Avg Recall: 0.000000, count: 8 Region Avg IOU: 0.464623, Class: 0.500810, Obj: 0.499032, No Obj: 0.500008, Avg Recall: 0.250000, count: 8 Region Avg IOU: 0.483549, Class: 0.500136, Obj: 0.499292, No Obj: 0.500001, Avg Recall: 0.375000, count: 8 Region Avg IOU: 0.483937, Class: 0.500155, Obj: 0.499317, No Obj: 0.500002, Avg Recall: 0.625000, count: 8 Region Avg IOU: 0.409560, Class: 0.500784, Obj: 0.499074, No Obj: 0.500003, Avg Recall: 0.125000, count: 8 Region Avg IOU: 0.451427, Class: 0.500018, Obj: 0.499314, No Obj: 0.500000, Avg Recall: 0.250000, count: 8 Region Avg IOU: 0.473265, Class: 0.499644, Obj: 0.499489, No Obj: 0.500002, Avg Recall: 0.375000, count: 8 Region Avg IOU: 0.466625, Class: 0.499832, Obj: 0.499081, No Obj: 0.500007, Avg Recall: 0.250000, count: 8 1: 213.611389, 213.611389 avg, 0.000100 rate, 834.662103 seconds, 64 images Loaded: 0.000028 seconds

wataru129 commented 5 years ago

上のサイトを見ると8000エポックほどで検出し始めたそうですが，1エポックあたり平均で15分ほどかかってしまうのでGPUなしでは少々現実的ではなさそうですね．．とりあえずデータセットの追加を行います．．

minamocake / SmartProductInspection

モデル作成 #2