yellow-binary-tree / HawkEye

Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
34 stars 1 forks source link

Demo not working #6

Closed ooza closed 6 months ago

ooza commented 6 months ago

Hello, Could you please provide more details on how to run your trained model on a custom video ? Because the demo.py is not working! Actually there is no a folder called "model" as mentioned in the config file. I tried to create it manually and download the needed checkpoints Screenshot 2024-04-19 at 15 33 24 I was able to download all models except the one of Vicuna! Can you please tell me if this is the correct version of the weights to download? https://huggingface.co/lmsys/vicuna-7b-v1.1/tree/main

Btw thanks for this great work!

yellow-binary-tree commented 6 months ago

Following previous works like VideoChat and VideoChat2, our model is based on Vicuna-7B v0. You should download vicuna ckpt from https://huggingface.co/lmsys/vicuna-7b-delta-v0.

Note that the downloaded files is only the delta weight, you need to follow the instructions of the README of the https://huggingface.co/lmsys/vicuna-7b-delta-v0 to apply it on top of the original LLaMA weights to get actual Vicuna weights.

I apologize for this misunderstanding and have changed the directory from model/vicuna-7b to model/vicuna-7b-v0 in config and README files to provide clearer instructions.

ooza commented 6 months ago

Thanks a lot for your reply! I got a CUDA outofmemory! when trying to load hawkeye.pth here's my output, knowing that I'm working with a GPU K80, 12G of memory (I've 4 GPUs like this but I used only one) could you tell me how to resolve this bug please? How much ressources do I need to run the demo ? Is your code compatible with multiple GPU use? Screenshot 2024-04-24 at 14 24 39

The complete error output:

OutOfMemoryError Traceback (most recent call last) Cell In[3], line 25 22 cfg.model.vision_encoder.num_frames = 4 24 model = HawkEye_it(config=cfg.model) ---> 25 model.set_device_ids([cfg.device])

File ~/VLM/HawkEye/models/hawkeye_it.py:210, in HawkEye_it.set_device_ids(self, device_ids) 208 self.extra_query_tokens = nn.Parameter(self.extra_query_tokens.to(self.devices[0])) 209 self.llama_proj.to(self.devices[0]) --> 210 self.llama_model.to(self.devices[0])

File ~/envs/video-text/lib/python3.9/site-packages/transformers/modeling_utils.py:1900, in PreTrainedModel.to(self, *args, *kwargs) 1895 raise ValueError( 1896 ".to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the" 1897 " model has already been set to the correct devices and casted to the correct dtype." 1898 ) 1899 else: -> 1900 return super().to(args, **kwargs)

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:1145, in Module.to(self, *args, **kwargs) 1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, 1142 non_blocking, memory_format=convert_to_format) 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) -> 1145 return self._apply(convert)

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn) 795 def _apply(self, fn): 796 for module in self.children(): --> 797 module._apply(fn) 799 def compute_should_use_set_data(tensor, tensor_applied): 800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): 801 # If the new tensor has compatible tensor type as the existing tensor, 802 # the current behavior is to change the tensor in-place using .data =, (...) 807 # global flag to let the user control whether they want the future 808 # behavior of overwriting the existing tensor or not.

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn) 795 def _apply(self, fn): 796 for module in self.children(): --> 797 module._apply(fn) 799 def compute_should_use_set_data(tensor, tensor_applied): 800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): 801 # If the new tensor has compatible tensor type as the existing tensor, 802 # the current behavior is to change the tensor in-place using .data =, (...) 807 # global flag to let the user control whether they want the future 808 # behavior of overwriting the existing tensor or not.

[... skipping similar frames: Module._apply at line 797 (2 times)]

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn) 795 def _apply(self, fn): 796 for module in self.children(): --> 797 module._apply(fn) 799 def compute_should_use_set_data(tensor, tensor_applied): 800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): 801 # If the new tensor has compatible tensor type as the existing tensor, 802 # the current behavior is to change the tensor in-place using .data =, (...) 807 # global flag to let the user control whether they want the future 808 # behavior of overwriting the existing tensor or not.

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:820, in Module._apply(self, fn) 816 # Tensors stored in modules are graph leaves, and we don't want to 817 # track autograd history of param_applied, so we have to use 818 # with torch.no_grad(): 819 with torch.no_grad(): --> 820 param_applied = fn(param) 821 should_use_set_data = compute_should_use_set_data(param, param_applied) 822 if should_use_set_data:

File ~/envs/video-text/lib/python3.9/site-packages/torch/nn/modules/module.py:1143, in Module.to..convert(t) 1140 if convert_to_format is not None and t.dim() in (4, 5): 1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, 1142 non_blocking, memory_format=convert_to_format) -> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 2; 11.17 GiB total capacity; 3.25 GiB already allocated; 7.55 GiB free; 3.35 GiB allowed; 3.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ooza commented 6 months ago

I've resolved the problem after a deep analysis of the code. Actually your code works with multiple GPU but I just added a check on low_resource when loading the llama model (in the hawkeye.py file):

if not self.low_resource:
            self.llama_model.to(self.devices[0])

Then I set low_resource to True in the config file I've also added device_map in the else block of the same file:

   else:
                self.llama_model = LlamaForCausalLM.from_pretrained(
                    llama_model_path, config=llama_config,
                    torch_dtype=torch.float16,
                    device_map="auto",
                )

PS: we need to pip install bitsandbytes for k-bit quantization