microsoft / GenerativeImage2Text

GIT: A Generative Image-to-text Transformer for Vision and Language
MIT License
546 stars 68 forks source link

How to increase sample frames number to more than 6? #54

Closed ee2110 closed 12 months ago

ee2110 commented 1 year ago

Hi, thank you so much for the great works! I have questions about sampled frame number, in the paper mentioned

During inference, we uniformly sample 6 frames with center crop.

I am keen to know is possible for us to sample more than 6 frames during inferences? I got this error when I try to use more than 6 frames per video clip.

IndexError: index 6 is out of range

Look forward to your response. Thank you!

amsword commented 1 year ago

it should be able to handle more than 6 frames. can you post more call stacks when you hit this error?

amsword commented 12 months ago

please re-open it if you still hit the issue.

ee2110 commented 11 months ago

Hi @amsword ,

Thank you for follow up, I still hit the issue and here are the details:

I use the codes as given in here

import av
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("microsoft/git-large-vatex")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-large-vatex")

# set seed for reproducability
np.random.seed(45)

def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.
    Args:
        container (`av.container.input.InputContainer`): PyAV container.
        indices (`List[int]`): List of frame indices to decode.
    Returns:
        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
    '''
    Sample a given number of frame indices from the video.
    Args:
        clip_len (`int`): Total number of frames to sample.
        frame_sample_rate (`int`): Sample every n-th frame.
        seg_len (`int`): Maximum allowed index of sample's last frame.
    Returns:
        indices (`List[int]`): List of sampled frame indices
    '''
    converted_len = int(clip_len * frame_sample_rate)
    end_idx = np.random.randint(converted_len, seg_len)
    start_idx = end_idx - converted_len
    indices = np.linspace(start_idx, end_idx, num=clip_len)
    indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
    return indices

# load video
file_path = '/path/to/video.mp4'
container = av.open(file_path)

# sample frames
# HERE is the variable I made a change so that num_frames is more than 6
num_frames = 16 #model.config.num_image_with_embedding

indices = sample_frame_indices(
    clip_len=num_frames, frame_sample_rate=4, seg_len=container.streams.video[0].frames
)
frames = read_video_pyav(container, indices)

pixel_values = processor(images=list(frames), return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)

print("Generated caption:", processor.batch_decode(generated_ids, skip_special_tokens=True))

and it return this error

│ │ │ 587 │ │ """Get the absolute index for the list of modules""" │ │ 588 │ │ idx = operator.index(idx) │ │ 589 │ │ if not (-len(self) <= idx < len(self)): │ │ ❱ 590 │ │ │ raise IndexError('index {} is out of range'.format(idx)) │ │ 591 │ │ if idx < 0: │ │ 592 │ │ │ idx += len(self) │ │ 593 │ │ return str(idx) │ ╰────────────────────────────────────────────────────────────────────────────────────╯ IndexError: index 6 is out of range

Need your help to advise how to sample more than 6 frames for one video. Thank you.

abisekrk commented 8 months ago

@ee2110 Were you able to increase the frame limit?

ee2110 commented 8 months ago

@ee2110 Were you able to increase the frame limit?

Hi @abisekrk , regrettably no, unable to sample more than 6 frames for each video. Therefore, I try other videoQA or captioning models. I was still keen to know if we can increase the frame limit.