Open wjp0408 opened 5 years ago
Q1: Yes.
2: Whether to use 1000 most frequent categories is up to you. Maybe using all 3850 categories will perform better? 😄
3: Whether to use thresh > 0.005 and whether to divide into 12 splits are up to you.
Q2: Yes. (If you are wondering center-x center-y
---- because darknet YOLOv2 did this.)
4: You don't have ground truth of testset. If you test on testset, you cannot run python3 detection_perf.py
, but you can upload the results to evaluation server.
Thanks for your reply ! :)
I just use val set as test set, train+val set as train set.
And when I run cd ../judge && python3 detection_perf.py
, I got this ...
...... and finally an ERROR:
This really confused me ......
Thanks for your code again. :) And Can you tell me how long( or how many max_batches) and the number of gpus did you train yolov2 with CTW in origin paper ? If that's okay with you... Thanks.
NVIDIA GTX TITAN X (PASCAL) * 1, 3.0 sec/step, 38 hours in total.
@yuantailing Hi, I'm confused by this two passages in Appendix of tutorial Part 3:
Q1: How to choose c0
? Why sometimes nums(TPs) + nums(FNs) > nums(GTs)
? (Why nums(GT matched with detected box) + nums(GT unmatched with detected box) != nums(GTs)? )
Q2: How to compute AP? Does it means when c0 is given, all boxes with score < c0
will be filtered out, then many recall 0, recall 1, ..., recall n
are given, and the AP is the mean value of max precisions under each recall ?
Like this :
Q1. Sorry, I made a mistake. It should be ''we take a minimum confidence score $c_0$ which leads to $num(TPs) + num(FPs) \leq num(GTs)$''. The paper is correct (in section 4.2):
To compute the recall rates, for each image in the testing set, denoting the number of annotated character instances as n, we select n recognized character instances with the highest confidences as output of YOLOv2.
The mistake is fixed in https://github.com/yuantailing/ctw-baseline/commit/ff979544adaef7905ddf1810e412ede859f064f0.
Q2. Yes, and I think it's the equivalent to the AP in PASCAL VOC. For every real number c0, we can compute a recall (The `recall' is not recall metric mentioned in the paper) and a precision. So, there are (M + 1) kinds of c0 levels to compute (M + 1) recalls and (M + 1) precisions.
We use max precisions where (r' > r) to compute AP, it's also the same.
Thanks for your patience and quick reply. :)
Hi, Sorry to bothering. I want to use some other detection net arch (for example, YOLOv3, mask rcnn...) to train with CTW DATASET. And I just want to make sure that my total procedure of detection is/or not correct...(Because my experiment result is too bad...)
1. Follw the tutorial part1 and part3 until
cd ../detection && python3 prepare_train_data.py
. ( Question 1 :1 0.716797 0.395833 0.216406 0.147222
in trainval txt files meansclass center-x center-y w h
? )2. Just use all jpgs and txts in trainval to train a net. And use
cates.json
generated bypython3 decide_cates.py
with train+val.3. Just use
python3 prepare_test_data.py
to generate test set, use trained net to output all boxes in all test jpgs withconfidence thresh> 0.005
, then generate fileschinese.0.txt ~ chnese.11.txt
by myself just like the output ofpython3 eval.py
. (Question 2:products/test/3032626_0_3_5.jpg 12 288.8592 434.3807 14.8512 39.1104 0.072
in each line of chinese.x.txt means every bbox withfilename class topleft-x topleft-y w h
with respect to the scale 1216 ? )4. Finally, just use
python3 merge_results.py
andcd ../judge && python3 detection_perf.py
without any extra change to get the final result !But I get the really poor result... Did I MISS something important... ? Thanks for your help. :)