yuantailing / ctw-baseline

Baseline methods for [CTW dataset](https://ctwdataset.github.io/)
MIT License
330 stars 88 forks source link

Quenstion about total detection procedure #30

Open wjp0408 opened 5 years ago

wjp0408 commented 5 years ago

Hi, Sorry to bothering. I want to use some other detection net arch (for example, YOLOv3, mask rcnn...) to train with CTW DATASET. And I just want to make sure that my total procedure of detection is/or not correct...(Because my experiment result is too bad...)

1. Follw the tutorial part1 and part3 until cd ../detection && python3 prepare_train_data.py. ( Question 1 :1 0.716797 0.395833 0.216406 0.147222 in trainval txt files means class center-x center-y w h? )

2. Just use all jpgs and txts in trainval to train a net. And use cates.json generated by python3 decide_cates.py with train+val.

3. Just use python3 prepare_test_data.py to generate test set, use trained net to output all boxes in all test jpgs with confidence thresh> 0.005, then generate files chinese.0.txt ~ chnese.11.txt by myself just like the output of python3 eval.py. (Question 2: products/test/3032626_0_3_5.jpg 12 288.8592 434.3807 14.8512 39.1104 0.072 in each line of chinese.x.txt means every bbox with filename class topleft-x topleft-y w h with respect to the scale 1216 ? )

4. Finally, just use python3 merge_results.py and cd ../judge && python3 detection_perf.py without any extra change to get the final result !

But I get the really poor result... Did I MISS something important... ? Thanks for your help. :)

yuantailing commented 5 years ago

Q1: Yes.

2: Whether to use 1000 most frequent categories is up to you. Maybe using all 3850 categories will perform better? 😄

3: Whether to use thresh > 0.005 and whether to divide into 12 splits are up to you.

Q2: Yes. (If you are wondering center-x center-y ---- because darknet YOLOv2 did this.)

4: You don't have ground truth of testset. If you test on testset, you cannot run python3 detection_perf.py, but you can upload the results to evaluation server.

wjp0408 commented 5 years ago

Thanks for your reply ! :) I just use val set as test set, train+val set as train set. And when I run cd ../judge && python3 detection_perf.py, I got this ... image

...... and finally an ERROR:

image

This really confused me ......

yuantailing commented 5 years ago

fixed in https://github.com/yuantailing/ctw-baseline/commit/f9c70fc83f0a07cf911231d10e7e7662fcc083e1 .

wjp0408 commented 5 years ago

Thanks for your code again. :) And Can you tell me how long( or how many max_batches) and the number of gpus did you train yolov2 with CTW in origin paper ? If that's okay with you... Thanks.

yuantailing commented 5 years ago

NVIDIA GTX TITAN X (PASCAL) * 1, 3.0 sec/step, 38 hours in total.

wjp0408 commented 5 years ago

@yuantailing Hi, I'm confused by this two passages in Appendix of tutorial Part 3:

image Q1: How to choose c0 ? Why sometimes nums(TPs) + nums(FNs) > nums(GTs) ? (Why nums(GT matched with detected box) + nums(GT unmatched with detected box) != nums(GTs)? )

image Q2: How to compute AP? Does it means when c0 is given, all boxes with score < c0 will be filtered out, then many recall 0, recall 1, ..., recall n are given, and the AP is the mean value of max precisions under each recall ? Like this : image

yuantailing commented 5 years ago

Q1. Sorry, I made a mistake. It should be ''we take a minimum confidence score $c_0$ which leads to $num(TPs) + num(FPs) \leq num(GTs)$''. The paper is correct (in section 4.2):

To compute the recall rates, for each image in the testing set, denoting the number of annotated character instances as n, we select n recognized character instances with the highest confidences as output of YOLOv2.

The mistake is fixed in https://github.com/yuantailing/ctw-baseline/commit/ff979544adaef7905ddf1810e412ede859f064f0.

Q2. Yes, and I think it's the equivalent to the AP in PASCAL VOC. For every real number c0, we can compute a recall (The `recall' is not recall metric mentioned in the paper) and a precision. So, there are (M + 1) kinds of c0 levels to compute (M + 1) recalls and (M + 1) precisions.

We use max precisions where (r' > r) to compute AP, it's also the same.

https://github.com/yuantailing/ctw-baseline/blob/ff979544adaef7905ddf1810e412ede859f064f0/cppapi/eval_tools.hpp#L145-L146

wjp0408 commented 5 years ago

Thanks for your patience and quick reply. :)