Open suhaisheng opened 7 years ago
If I remember correctly, these 3 videos have incorrect annotations which are sitting beyond the videos’ time span.
In terms of testing results, do you have specific numbers and settings for me to look at?
OK!firstly i only use the 213 videos which really make sense in the evaluation process,and run the
command: python gen_proposal_list.py thumos14 ./thumos14/Test/
cause the directory structure of my dataset is a little different from yours.
(like Test/[img,flow_x,flow_y]/video_name/%frame_id.jpg)So I change your function code(def _load_image(self,directory,idx)in order to adapt my dataset structure to load the corresponding images.
Then I continue my reproducing work by running the testing command:
python ssn_test.py thumos14 RGB none result_inceptionv3_thumos14_imagenet_rgb.npz --arch InceptionV3 --use_reference
My evaluation command is as follows:
python eval_detection_results.py thumos14 result_inceptionv3_thumos14_imagenet_rgb.npz
The RGB modality result is:
| IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average |
| mean AP | 0.3946 | 0.3476 | 0.2862 | 0.2157 | 0.1470 | 0.0896 | 0.0488 | 0.0222 | 0.0039 |0.1729 |
The Flow modality testing and evaluation process are almost the same as above:
The Flow modality result is:
| IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average |
| mean AP | 0.4541 | 0.4085 | 0.3530 | 0.2883 | 0.2184 | 0.1456 | 0.0851 | 0.0365 | 0.0076 |0.2219 |
In order to get the final RGB+Flow modality result,I run the command:
python eval_detection_results.py thumos14 result_inceptionv3_thumos14_imagenet_rgb.npz result_inceptionv3_thumos14_imagenet_flow.npz
And the result is listed as follows:
| IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average |
| mean AP | 0.5599 | 0.5033 | 0.4351 | 0.3479 | 0.2622 | 0.1680 | 0.0946 | 0.0437 | 0.0086 |0.2692 |
as above, finally i get the reproducing result of RGB+Flow modality @0.5 IoU is 0.2622.It seems a little different from your 28.00(29.8*) in the paper,can you explain why the situation happens or how to avoid it and give your valuable advice,I would appreciate it!
The performance of RGB and flow you got are both lower than the reference results on our machine. My guess is that it may be due to your modified data loading routines. If you would like to upload the generated proposal lists I may be able to help you.
Hi Yuanjun, If I flip optical flow as RGB image when training the optical flow network, will there be a big difference in the result? I noticed that in caffe version, you seem directly flip optical flow as RGB image.
@bityangke If you mean the Caffe version of TSN, we do invert the pixel values after flipping the optical flow images. Although the performance difference is not significant, it makes the system technically sound and may help in cases where this flipping matters.
@suhaisheng Hi, I had worse result of flow modality, but I don't know how to fix it? Could you please share your thumos14_flow_score.npz? I just want to verify where is the problem. Thank you very much! And, this is my issue, https://github.com/yjxiong/action-detection/issues/12, could you help me?
this is my reproducing result of Flow modality(unzip it then you can get the .npz file).Note that there is still 1.7% different from the paper.
@suhaisheng Thank you very much!! but it's so werid. When I use your .npz file and proposal file to evaluate flow result, result is different from yours either, my flow result is:
+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+ | IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average | +------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+ | mean AP | 0.4344 | 0.3977 | 0.3406 | 0.2765 | 0.2083 | 0.1470 | 0.0853 | 0.0340 | 0.0048 | 0.2143 | +------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
eval_detection_results.py only need two files, I think if we use the same files, we should get same result. Maybe I should check my code again.
The result is same as mine(you may mistake my another experiment result(--arch InceptionV3)for it)..What I am curious about is why the RGB modality result you got is higher than mine and even higher than the authors' listed in the paper.
@suhaisheng Have you figure out why? I found that in thumos14_tag_val_normalized_proposal_list.txt,there are also many videos with no groundtruth. That just makes no sense
Hi, @suhaisheng. Could you explain what´s the meaning of each line in the thumos14_tag_val_normalized_proposal_list.txt?
@jiaozizhao Please see the wiki page I linked. https://github.com/yjxiong/action-detection/wiki/A-Description-of-the-Proposal-Files
Hi @yjxiong . Thanks. And could you explain the result after running ssn_test.py? I noted there are four arrays for each video. Could you explain them? If I want to visualize the results, what information I should use? Thanks.
@jiaozizhao You can find the definitions at https://github.com/yjxiong/action-detection/blob/master/ssn_test.py#L95
For how are the results evaluated, you can refer to https://github.com/yjxiong/action-detection/blob/master/eval_detection_results.py
Hi @yjxiong. Thank you very much. And sorry for not reading the code carefully due to my urgency. I will read them.
Hello, do you know how to generate this normalized_proposal_list.txt on other videos @suhaisheng
Hello, do you know how to generate this normalized_proposal_list.txt on other videos @suhaisheng have u resolved it?
hello guy,I just try to reproduce your amazing work, and for convenience(computational cost),I just use 213 videos instead which are used later for test in thumos14 dataset eval toolkit.But I find that there might be something wrong about your groundtruth annotation in thumos14_tag_test_normalized_proposal_list.txt file.For example, you can check your groundtruth annotations in following three videos:video_test_0001292、video_test_0000270、video_test_0001496. In your .txt file,these three videos are negative with 0 gt instance,however in thumos14 dataset test annotation , all of them include several groundtruth action instances.So when i run the ssn test python file,my video numbers decrease from 213 to 210,and the final reproducing results tend to be lower than yours listed in the paper(about 1.5% difference).WAITING FOR YOUR REPLY, thx so much!