Closed nqx12348 closed 1 year ago
@nqx12348 , thanks for your interesting and asking! Both are valuable questions.
For activitynet, one issue is that most baselines use the existed video features e.g., C3D; while in our unified co-training, we need to ensure all benchmarks use the same features (e.g., slowfast+clip), thus we need to extract activitynet by ourselves. During the activitynet downloading, we find most RGB video links are invalid and fail to access. Thus, we are unable to align the previous benchmarks setting i.e. #training sample / #testing sample; Similar issues happen in didemo, mad (cannot access videos) benchmarks. thus, we select Charades / NLQ / Tacos since we can fully access all the videos.
Regarding the second question, thank you for reminding! I just discovered this problem and am trying to find the reason. and will update later.
@QinghongLin In second problem,
It seems that forward() function in main_gradio.py should contain
model.eval()
just before
with torch.no_grad():
(may be @ 82L, main_gradio.py)
Hi, @jjihwann Sorry for this stupid mistake, I have updated the correspond code in repo, thank you again!
Based on @jjihwann instruction, now the different predictions results have been solved. Thanks.
close since solve the problem, please open if have new issue.
Hi, congratulations on your great sucess! I have two questions about UniVTG:
Thanks!