Closed ee2110 closed 1 year ago
it should be able to handle more than 6 frames. can you post more call stacks when you hit this error?
please re-open it if you still hit the issue.
Hi @amsword ,
Thank you for follow up, I still hit the issue and here are the details:
I use the codes as given in here
import av
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("microsoft/git-large-vatex")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-large-vatex")
# set seed for reproducability
np.random.seed(45)
def read_video_pyav(container, indices):
'''
Decode the video with PyAV decoder.
Args:
container (`av.container.input.InputContainer`): PyAV container.
indices (`List[int]`): List of frame indices to decode.
Returns:
result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
'''
frames = []
container.seek(0)
start_index = indices[0]
end_index = indices[-1]
for i, frame in enumerate(container.decode(video=0)):
if i > end_index:
break
if i >= start_index and i in indices:
frames.append(frame)
return np.stack([x.to_ndarray(format="rgb24") for x in frames])
def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
'''
Sample a given number of frame indices from the video.
Args:
clip_len (`int`): Total number of frames to sample.
frame_sample_rate (`int`): Sample every n-th frame.
seg_len (`int`): Maximum allowed index of sample's last frame.
Returns:
indices (`List[int]`): List of sampled frame indices
'''
converted_len = int(clip_len * frame_sample_rate)
end_idx = np.random.randint(converted_len, seg_len)
start_idx = end_idx - converted_len
indices = np.linspace(start_idx, end_idx, num=clip_len)
indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
return indices
# load video
file_path = '/path/to/video.mp4'
container = av.open(file_path)
# sample frames
# HERE is the variable I made a change so that num_frames is more than 6
num_frames = 16 #model.config.num_image_with_embedding
indices = sample_frame_indices(
clip_len=num_frames, frame_sample_rate=4, seg_len=container.streams.video[0].frames
)
frames = read_video_pyav(container, indices)
pixel_values = processor(images=list(frames), return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
print("Generated caption:", processor.batch_decode(generated_ids, skip_special_tokens=True))
and it return this error
│ │ │ 587 │ │ """Get the absolute index for the list of modules""" │ │ 588 │ │ idx = operator.index(idx) │ │ 589 │ │ if not (-len(self) <= idx < len(self)): │ │ ❱ 590 │ │ │ raise IndexError('index {} is out of range'.format(idx)) │ │ 591 │ │ if idx < 0: │ │ 592 │ │ │ idx += len(self) │ │ 593 │ │ return str(idx) │ ╰────────────────────────────────────────────────────────────────────────────────────╯ IndexError: index 6 is out of range
Need your help to advise how to sample more than 6 frames for one video. Thank you.
@ee2110 Were you able to increase the frame limit?
@ee2110 Were you able to increase the frame limit?
Hi @abisekrk , regrettably no, unable to sample more than 6 frames for each video. Therefore, I try other videoQA or captioning models. I was still keen to know if we can increase the frame limit.
Hi, thank you so much for the great works! I have questions about sampled frame number, in the paper mentioned
I am keen to know is possible for us to sample more than 6 frames during inferences? I got this error when I try to use more than 6 frames per video clip.
IndexError: index 6 is out of range
Look forward to your response. Thank you!