happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
419 stars 77 forks source link

compute cls loss #79

Closed OpenSorceYCW closed 1 year ago

OpenSorceYCW commented 1 year ago

i find gt cls_id from 0 to 19 in thumos14.json, while computing cls_loss ignored id=0, https://github.com/happyharrycn/actionformer_release/blob/main/libs/modeling/meta_archs.py#L547 is it no problem?

tzzcl commented 1 year ago

For this line, gt_cls is a one-hot tensor with shape (num_points, num_classes). Thus, the line you mentioned just calculates whether a point contains positive examples, not ignore the id=0 class.

OpenSorceYCW commented 1 year ago

thank you very much! for my project, our input is a video with one segmets and one cls label, the cls_loss is 5.8678e-05 and reg_loss is 0 during training, Can you give me some ideas which causing this problem?

tzzcl commented 1 year ago

For your project, I don't fully understand your questions, if you only have one video in the training set, it is infeasible to use deep learning technologies here, otherwise, I think it means the model overfits the training data, so what's the performance on the testing data?

OpenSorceYCW commented 1 year ago

sorry, I should express clearly. I use actionformer model to train my datasets which one sample only has one segmnet and cls_label, not many segments and cls_label, but cls_loss is lower and reg_loss is 0 in first step during my training, how to solve this problem? thank you!

tzzcl commented 1 year ago

For your problem, there are a few possibilities. The most possible reason is: your converted dataset does not contain positive examples in your training data, you can try to disable the center sampling or verify is there any positive points in your training data.

OpenSorceYCW commented 1 year ago

thank you! I have already checked my dataset but find num_pos is 0 when I debug code in training, the valid_mask[3,:] is tensor([[ 0], [ 1], [ 2], [ 3], [ 4], [ 5], [ 6], [ 7], [ 8], [ 9], [ 10], [ 11], [ 12], [ 13], [ 14], [ 15], [ 16], [ 17], [ 18], [ 19], [ 20], [ 21], [ 22], [ 23], [ 24], [ 25], [ 26], [ 27], [ 28], [ 29], [ 30], [ 31], [ 32], [ 33], [ 34], [ 35], [ 36], [ 37], [ 38], [ 39], [ 40], [ 41], [ 42], [ 43], [ 44], [ 45], [ 46], [ 47], [ 48], [ 49], [ 50], [ 51], [ 52], [ 53], [ 54], [ 55], [ 56], [ 57], [ 58], [ 59], [ 60], [ 61], [ 62], [ 63], [ 64], [ 65], [ 66], [ 67], [ 68], [ 69], [ 70], [ 71], [ 72], [ 73], [ 74], [ 75], [ 76], [ 77], [ 78], [ 79], [ 80], [ 81], [ 82], [ 83], [ 84], [ 85], [ 86], [ 87], [ 88], [ 89], [ 90], [ 91], [ 92], [2304], [2305], [2306], [2307], [2308], [2309], [2310], [2311], [2312], [2313], [2314], [2315], [2316], [2317], [2318], [2319], [2320], [2321], [2322], [2323], [2324], [2325], [2326], [2327], [2328], [2329], [2330], [2331], [2332], [2333], [2334], [2335], [2336], [2337], [2338], [2339], [2340], [2341], [2342], [2343], [2344], [2345], [2346], [2347], [2348], [2349], [2350], [3456], [3457], [3458], [3459], [3460], [3461], [3462], [3463], [3464], [3465], [3466], [3467], [3468], [3469], [3470], [3471], [3472], [3473], [3474], [3475], [3476], [3477], [3478], [3479], [4032], [4033], [4034], [4035], [4036], [4037], [4038], [4039], [4040], [4041], [4042], [4043], [4320], [4321], [4322], [4323], [4324], [4325], [4464], [4465], [4466]], device='cuda:0') the gt_cls is tensor([[ 0, 4471], [ 0, 4472], [ 0, 4473], [ 0, 4474], [ 0, 4475], [ 0, 4476], [ 0, 4477], [ 1, 4336], [ 1, 4337], [ 1, 4338], [ 1, 4339], [ 2, 4471], [ 2, 4472], [ 2, 4473], [ 2, 4474], [ 2, 4475], [ 2, 4476], [ 2, 4477], [ 3, 4340], [ 3, 4341], [ 3, 4342]], device='cuda:0') so torch.logical_and is empty and num_pos is 0. https://github.com/happyharrycn/actionformer_release/blob/main/libs/modeling/meta_archs.py#L554 my weixin is:hubeiycw, it is my honor if you can communicate with me through WeChat. I look forward to your reply, thank you!

happyharrycn commented 1 year ago

This looks like the data were not properly loaded, mostly likely due to issues with your custom dataloader (or the json file).

OpenSorceYCW commented 1 year ago

thank you for your reply, I have solve it! @happyharrycn @tzzcl Another problem,my dataset is some videos feature and targets which the number of action categories is 1 and the number of segment is 1 in training, background videos feature data is not in training dataset, If I only want to classify a untrimmed videos into action video or background in inference, so, how do i config parameters in the file of thumos_i3d.yaml and how to modified train and inference code in actionformer model? Or does this scene apply to the actionformer model? I look forward to your reply, thank you!

tzzcl commented 1 year ago

For your problem, I think you only need the action classification model for untrimmed videos, an example project can be found in UntrimmedNet.

OpenSorceYCW commented 1 year ago

thank you very much for you kindly reply! Your suggestion will help me a lot, I will be deeply grateful if you can recommend me some of the latest reference which is realized by pytorch code.

OpenSorceYCW commented 1 year ago

@happyharrycn @tzzcl by the way, Can actionformer model recognise untrimmed videos?

OpenSorceYCW commented 1 year ago

@happyharrycn
Isn't the effect of strong supervision better than that of weak supervision in Temporal Action Localization?

happyharrycn commented 1 year ago

This is not about strong vs. weak supervision, but rather different problems. Consider the following two problems (both assuming untrimmed videos): (a) recognizing the occurrence of events; and (b) recognizing the events and localizing them in time. The former is a classification problem (e.g., think about image classification), and the latter is a detection problem (e.g., think about object detection). Their problem formulations, key challenges, and solutions are quite different.

I am going to close the issue, as the questions are now off-topic. Such questions are better resolved by going through vision course materials and not using the github issue here.