ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
https://unav100.github.io
MIT License
52 stars 3 forks source link

Videos of duration greater than 1 minute #10

Open 1980x opened 5 months ago

1980x commented 5 months ago

Hi. I am training the model on untrimmed videos of duration 3-10 minutes. I am giving segment start and segment end in secs in the annotation file. eg. clip_id,segment_start,segment_end,label,label_id,duration,subset ASDAB005.mp4,460.0,465.0,Hitting others,7.0,100000000.0,train ASDHY041.mp4,356.0,363.0,Crying,0.0,100000000.0,train ASDHY064_54544.mp4,349.0,414.0,Walking away,5.0,100000000.0,train

And duration value is put a very large value.

But during evaluation stage, segments predicted are always in duration between 0-60 secs only. Does anything need to be modified for code to work for longer duration videos? Thank you

ttgeng233 commented 5 months ago

Hi, thanks for your feedback.

I made some modifications in libs/datasets/unav100.py, loc_generators.py and libs/modeling/multimodal_meta_archs.py in order to be compatible with longer videos during evaluation.

Besides, you can change the hyperparameter "max_buffer_len_factor": 1.0," in libs/core/config.py to 2.0/3.0/4.0, etc., for longer videos.

1980x commented 5 months ago

Hi. Thanks for making changes in the code. But I am unable to run as it throws some errors.

multimodal_meta_archs.py", line 426, in losses gt_offsets = torch.stack(gt_offsets)[pos_mask] RuntimeError: stack expects each tensor to be equal size, but got [441, 11, 2] at entry 0 and [441, 2] at entry 3

I also ran the original code by making all video clips to be of duration 1 minutes like yours and action for few secs. It just gives mAP of <1% even after training. I have attached the log file for same. log.txt

Please see what could be the issue?

1980x commented 5 months ago

For the first issue of running after making changes, I could debug it but on validation set it gives

libs/datasets/loc_generators.py", line 101, in forward assert feat_len <= buffer_pts.shape[0], "Reached max buffer length for point generator" AssertionError: Reached max buffer length for point generator

ttgeng233 commented 5 months ago

I am wondering what is the parameter "max_buffer_len_factor" in your code. You can try to change it to a larger number to customize your video lengths.

1980x commented 5 months ago

I changed it to 5. Will make it 100 and try.

1980x commented 5 months ago

I also ran the original code by making all video clips to be of duration 1 minutes like yours and action for few secs. It just gives mAP of <1% even after training. I have attached the log file for same. log.txt

1980x commented 5 months ago

Can you please see what could be wrong?

ttgeng233 commented 5 months ago

Sorry, I'm not sure. It seems that the model is not converging. Please check your dataset and hyperparameter settings to ensure that they can adapt to the characteristics of your dataset.

1980x commented 5 months ago

Will it possible to look at my annotation file and suggest some possible hyper parameter?

1980x commented 5 months ago

W B Chart 17_2_2024, 6 31 37 am W B Chart 17_2_2024, 6 31 19 am But still mAP is below 1%

1980x commented 5 months ago

behaviour11_annotation_full.csv

1980x commented 5 months ago

dataset_name: behaviour-11 dataset: { json_file: data/maladaptive/annotations/behaviour11_annotation_full.json,

json_file: data/maladaptive/annotations/behaviour11annotations.json,

feat_folder: data/maladaptive/Maladaptive_audio_visual_features,

feat_folder: data/maladaptive/av_features/visual_features_i3d_flow,

file_prefix: ~, file_ext: .npy, max_seq_len: 1152, } model: { input_dim_V: 2048, input_dim_A: 128, use_abs_pe: True, class_aware: True, use_dependency: True, } opt: { learning_rate: 0.0001, epochs: 500,
weight_decay: 0.0001, warmup_epochs: 5, } loader: { batch_size: 2, } train_cfg: { loss_weight: 1, evaluate: True, eval_freq: 1, } test_cfg: { pre_nms_topk: 2000, max_seg_num: 1000, min_score: 0.001, multiclass_nms: True, nms_sigma : 0.4, iou_threshold: 0.75, } output_folder: ./ckpt/