Closed Luo-Z13 closed 8 months ago
Hi @Luo-Z13, thank you for your interest. You need to change the image size from 336 to 504. image = processor.preprocess(image,do_resize=True,crop_size ={'height': 504, 'width': 504},size = {'shortest_edge': 504}, return_tensors='pt')['pixel_values'][0] can you please change this line in train.py file, line 690,691. I have made the changes in the codebase as well. Let me know if it works now.
Hi @Luo-Z13, thank you for your interest. You need to change the image size from 336 to 504. image = processor.preprocess(image,do_resize=True,crop_size ={'height': 504, 'width': 504},size = {'shortest_edge': 504}, return_tensors='pt')['pixel_values'][0] can you please change this line in train.py file, line 690,691. I have made the changes in the codebase as well. Let me know if it works now.
Thank you for the response, the previous issue has now been resolved. However, I am encountering OOM when using 4*A100(40 GB), details are as follows:
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 216, in forward
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 216, in forward
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 216, in forward
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/peft/tuners/lora.py", line 822, in forward
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/peft/tuners/lora.py", line 822, in forward
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/peft/tuners/lora.py", line 822, in forward
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/peft/tuners/lora.py", line 822, in forward
self.lora_B[self.active_adapter](self.lora_B[self.active_adapter](self.lora_B[self.active_adapter](
torch.cudatorch.cudatorch.cuda...OutOfMemoryErrorOutOfMemoryErrorOutOfMemoryError: : : CUDA out of memory. Tried to allocate 1.04 GiB (GPU 3; 39.39 GiB total capacity; 29.67 GiB already allocated; 1.02 GiB free; 36.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 1.07 GiB (GPU 1; 39.39 GiB total capacity; 30.12 GiB already allocated; 397.12 MiB free; 37.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 1.04 GiB (GPU 2; 39.39 GiB total capacity; 29.76 GiB already allocated; 911.12 MiB free; 36.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
self.lora_B[self.active_adapter](
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.06 GiB (GPU 0; 39.39 GiB total capacity; 29.99 GiB already allocated; 719.12 MiB free; 36.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/2413 [00:44<?, ?it/s]
[2024-03-04 22:06:58,942] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 176970
[2024-03-04 22:06:59,647] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 176971
[2024-03-04 22:06:59,665] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 176972
[2024-03-04 22:06:59,681] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 176973
Scripts merge_lora_weights.py
seems to have an issue at the beginning (from llava... ?). After I changed it from
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
to
from geochat.model.builder import load_pretrained_model
from geochat.mm_utils import get_model_name_from_path
an error occurred:
Traceback (most recent call last):
File "GeoChat/scripts/merge_lora_weights.py", line 24, in
@Luo-Z13, can you please check what is the name of the "model_type" in your base model and the saved checkpoint in the config.json file? Please change it to "geochat", if it is "llava". Let me know if that works.
@Luo-Z13, can you please check what is the name of the "model_type" in your base model and the saved checkpoint in the config.json file? Please change it to "geochat", if it is "llava". Let me know if that works.
Thank you very much, it works now.
Closing this issue for now, please reopen if you find any difficulties.
I use the
finetune_lora.sh
to train, the context:and get the following error: [ File "/project/GeoChat/geochat/model/geochat_arch.py", line 96, in encode_images image_features = self.encode_images(images) File "/project/GeoChat/geochat/model/geochat_arch.py", line 96, in encode_images image_features = self.get_model().get_vision_tower()(images) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl image_features = self.get_model().get_vision_tower()(images) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) ... File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 866, in forward hidden_states = self.embeddings(pixel_values) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl hidden_states = self.embeddings(pixel_values) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 200, in forward return forward_call(args, kwargs) File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 200, in forward embeddings = embeddings + self.position_embedding(self.position_ids) RuntimeError: The size of tensor a (577) must match the size of tensor b (1297) at non-singleton dimension 1 embeddings = embeddings + self.position_embedding(self.position_ids) RuntimeError: The size of tensor a (577) must match the size of tensor b (1297) at non-singleton dimension 1]