Open 1980x opened 5 months ago
Hi, thanks for your feedback.
I made some modifications in libs/datasets/unav100.py, loc_generators.py and libs/modeling/multimodal_meta_archs.py in order to be compatible with longer videos during evaluation.
Besides, you can change the hyperparameter "max_buffer_len_factor": 1.0," in libs/core/config.py to 2.0/3.0/4.0, etc., for longer videos.
Hi. Thanks for making changes in the code. But I am unable to run as it throws some errors.
multimodal_meta_archs.py", line 426, in losses gt_offsets = torch.stack(gt_offsets)[pos_mask] RuntimeError: stack expects each tensor to be equal size, but got [441, 11, 2] at entry 0 and [441, 2] at entry 3
I also ran the original code by making all video clips to be of duration 1 minutes like yours and action for few secs. It just gives mAP of <1% even after training. I have attached the log file for same. log.txt
Please see what could be the issue?
For the first issue of running after making changes, I could debug it but on validation set it gives
libs/datasets/loc_generators.py", line 101, in forward assert feat_len <= buffer_pts.shape[0], "Reached max buffer length for point generator" AssertionError: Reached max buffer length for point generator
I am wondering what is the parameter "max_buffer_len_factor" in your code. You can try to change it to a larger number to customize your video lengths.
I changed it to 5. Will make it 100 and try.
I also ran the original code by making all video clips to be of duration 1 minutes like yours and action for few secs. It just gives mAP of <1% even after training. I have attached the log file for same. log.txt
Can you please see what could be wrong?
Sorry, I'm not sure. It seems that the model is not converging. Please check your dataset and hyperparameter settings to ensure that they can adapt to the characteristics of your dataset.
Will it possible to look at my annotation file and suggest some possible hyper parameter?
But still mAP is below 1%
dataset_name: behaviour-11 dataset: { json_file: data/maladaptive/annotations/behaviour11_annotation_full.json,
feat_folder: data/maladaptive/Maladaptive_audio_visual_features,
file_prefix: ~,
file_ext: .npy,
max_seq_len: 1152,
}
model: {
input_dim_V: 2048,
input_dim_A: 128,
use_abs_pe: True,
class_aware: True,
use_dependency: True,
}
opt: {
learning_rate: 0.0001,
epochs: 500,
weight_decay: 0.0001,
warmup_epochs: 5,
}
loader: {
batch_size: 2,
}
train_cfg: {
loss_weight: 1,
evaluate: True,
eval_freq: 1,
}
test_cfg: {
pre_nms_topk: 2000,
max_seg_num: 1000,
min_score: 0.001,
multiclass_nms: True,
nms_sigma : 0.4,
iou_threshold: 0.75,
}
output_folder: ./ckpt/
Hi. I am training the model on untrimmed videos of duration 3-10 minutes. I am giving segment start and segment end in secs in the annotation file. eg. clip_id,segment_start,segment_end,label,label_id,duration,subset ASDAB005.mp4,460.0,465.0,Hitting others,7.0,100000000.0,train ASDHY041.mp4,356.0,363.0,Crying,0.0,100000000.0,train ASDHY064_54544.mp4,349.0,414.0,Walking away,5.0,100000000.0,train
And duration value is put a very large value.
But during evaluation stage, segments predicted are always in duration between 0-60 secs only. Does anything need to be modified for code to work for longer duration videos? Thank you