Closed cyh-github closed 6 years ago
Another question,I have seen that your answer about the multi-labes for traning.You said that you repeat input the videos with different labels,but I find some thing do not match. In your anet_1.2_untrimmed_train_rgb_list.txt files: /media/data1/lmwang/data/anet_1.2_train_rgb_img_256_340/FWPJWq-uhUw/ 2944 38 /media/data1/lmwang/data/anet_1.2_train_rgb_img_256_340/FWPJWq-uhUw/ 2944 38 /media/data1/lmwang/data/anet_1.2_train_rgb_img_256_340/FWPJWq-uhUw/ 2944 38 it's form is ( video frame path, video frame number, and video groundtruth class).why do these has same groundtruth class for train?
Each video may have multiple action instances. Thus, in our train_list.txt, each row corresponds to an action instance. If there are multiple instances, there will be multiple entries (rows) in the file list, just as shown in above example.
I know that you train the video have multiple action instances just with one label,and each time with different labels(groudtruth). But the list show that it only repeats with the same label three times.
This means this video has multiple action instances of the same label. Please see the ActivityNet dataset for more details.
Ok,thank you.The last question,if there a detail guidance or scripts for training?
Please see the folder of scripts.
Thank you very much for your patiently reply.
Hi,I'm confused that the video-level recognition result is just one label for an untrimmed video or more?Because I find that there are videos with more than one labels in the test data in THUMOS14.Then the result of the untrimmednet will recognize such videos with how many labels? And how to implement?