open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.14k stars 1.22k forks source link

Bad result when inference on pretraind model with custom dataset for spatio-temporal action detection #2192

Open HAMA-DL-dev opened 1 year ago

HAMA-DL-dev commented 1 year ago

Check List

I have read related issues, such as 'Worse results after train on custom classes?', but cannot get the expected help.

my custom dataset and configuration

I uploaded these a day before on issue

Result

Yesterday, I succeed to inference walking pedestrian video on SlowFast abd SlowOnly config based on AVA dataset. But fail to inference tail light video on pre-trained model based pm my custom dataset.

To summarize, the location of predicted bounding box was weird and another inference output video has no bounding box which means there is no object detected. CSV file seems to be normal, but I want to get your advice out the result below.

part of csv

image

Trial

Using command $ python demo/demo_spatiotemporal_det.py --video {inference_video.mp4} --configs configs/{my_config} --checkpoint {my_checkpoint.pth} --det-score-thr 0.8 --action-score-thr 0.5 --label-map ${my_data_annotaions}/label_map.txt --predict-stepsize 8 --output-stepsize 4 --output-fps 6

Questions and my suspects

  1. I did not used --validate option when I train model cause of error : AttribueError at ${mmaction_envs}/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 83, in getattr . Is this can be result on bad training output?

  2. Is bounding box value format [x1, y1, x2, y2] or [x_center, y_center, w, h] ? My proposal file contains bounding box information not only coordinates of bbox but also confidence value and I get those using YOLOv5 with '--save-txt' and '--save-conf' options. The model returns text file of bounding box and its format follows latter, [x_center, y_center, w, h].

mareksubocz commented 1 year ago

Hi, as far as I've experienced, mmaction uses [x1, y1, x2, y2] format in most of its implementations.

HAMA-DL-dev commented 1 year ago

Sorry for not reading description here. I read this just after uploaded this issue and modified annotation data. But nothing had been changed after training modified annotations.

The task that I make use of tail light detection is spatio-temporal action detection. So I inference a custom trained model refer to this documentation. image

There is need to configure '--det-config' and '--det-checkpoint' and I configure this options default. But my model focus on detecting car, especially its tail light. So I suspect the bad result problem cause of incorrect configurations, '--det-config' and '--det-checkpoint'. Do I need another custom training for two configurations via mmdetection?

eliethesaiyan commented 1 year ago

@HAMA-DL-dev , i think in order to do spatial temporal inference on your custom dataset(car-tail-light), you will need two models(2config files), one for detection from mmdetect and another from mmaction2. --det-config should points to your mmdetect config file and --det-checkpoint to the mmdetect checkpoint file. From what i can deduce from demo_spatial_temporal_det.py code, action detection should happen only when detection happened.

HAMA-DL-dev commented 1 year ago

@eliethesaiyan Thanks to your advise. I tried to train mmdetection based on by custom dataset. The result looks nicer than before. Though there is still some problems(e.g., overfitting), I can fix this by modifying dataset. After I succeed custom train perfectly, I will let you know the custom train process and the purpose of using proposal value (cause I check your comment asking a usage of proposal file)