mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

How many frames are sampled per video for the training and testing process? #99

Closed Nastu-Ho closed 4 days ago

mmaaz60 commented 2 months ago

Hi @Nastu-Ho,

Thank you for your interest in our work. We sample 100 frames uniformly during both training and testing. Thank you

onlyonewater commented 2 months ago

@mmaaz60 , hi, can I sample more than 100 frames to infer on my own dataset?

onlyonewater commented 2 months ago

from https://github.com/mbzuai-oryx/Video-ChatGPT/issues/14, I know how to change the code, but how to change the code to extract the 1 frame or 2 frames per second? I think this implementation makes sense for long videos.

mmaaz60 commented 2 months ago

Hi @onlyonewater,

It would be something similar to below,

sample_fps = 1 

vr = VideoReader(video_path, ctx=cpu(0))
fps = vreader.get_avg_fps()
f_start = 0
f_end = len(vr) - 1
t_stride = int(round(float(fps) / sample_fps))
all_pos = list(range(f_start, f_end + 1, t_stride))
img_array = vr.get_batch(all_pos).asnumpy()

I hope this will be helpful. Thank YOu

onlyonewater commented 1 month ago

ok, I got it, thanks @mmaaz60