lxtGH / OMG-Seg

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Other
1.32k stars 50 forks source link

Training Error: 'Naive_Proj' object has no attribute 'visual_prompt_zero' #19

Closed dddraxxx closed 4 months ago

dddraxxx commented 4 months ago

Hi, thanks for your excellent work! When I try to train using command

PYTHONPATH=. NPROC_PER_NODE=8 xtuner train \
    omg_llava/configs/finetune/specific_tasks_finetune/finetune_refseg.py \
    --deepspeed deepspeed_zero2

I got the error

Exception has occurred: AttributeError
'Naive_Proj' object has no attribute 'visual_prompt_zero'
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
  File "/mnt/localssd/omg-llava/omg_llava/omg_llava/model/omg_llava.py", line 532, in get_visual_prompts_projector_zero
    return self.projector.model.visual_prompt_zero
  File "/mnt/localssd/omg-llava/omg_llava/omg_llava/model/omg_llava.py", line 463, in compute_loss
    loss = loss + self.get_visual_prompts_projector_zero()
  File "/mnt/localssd/omg-llava/omg_llava/omg_llava/model/omg_llava.py", line 415, in forward
    return self.compute_loss(data, data_samples, masks=masks,
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 176, in _run_forward
    results = self.model(**data, mode=mode)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 133, in train_step
    losses = self._run_forward(data, mode='loss')
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/mmengine/runner/loops.py", line 311, in run_iter
    outputs = self.runner.model.train_step(
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/mmengine/runner/loops.py", line 287, in run
    self.run_iter(data_batch)
  File "/home/qdong/.conda/envs/omg/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1200, in train
    model = self.train_loop.run()  # type: ignore
  File "/mnt/localssd/omg-llava/omg_llava/xtuner/tools/train.py", line 356, in main
    runner.train()
  File "/mnt/localssd/omg-llava/omg_llava/xtuner/tools/train.py", line 360, in <module>
    main()
AttributeError: 'Naive_Proj' object has no attribute 'visual_prompt_zero'

The self.projector.model is shown as here: Naive_Proj( (query_proj): Linear(in_features=512, out_features=6144, bias=True) (model): Sequential( (0): Linear(in_features=6144, out_features=4096, bias=True) (1): GELUActivation() (2): Linear(in_features=4096, out_features=4096, bias=True) ) (model_feat): Sequential( (0): Linear(in_features=6656, out_features=4096, bias=True) (1): GELUActivation() (2): Linear(in_features=4096, out_features=4096, bias=True) ) (seperate_embed): Embedding(1, 4096) ), is there any fix for the bug?

zhang-tao-whu commented 4 months ago

Thanks for your attention. We have now fixed this bug. You can try it again. You can start by using a small amount of data to test if it works well, to avoid the long time consumption of tokenizing data. We will also support lazy mode data processing to avoid lengthy tokenization before training.

dddraxxx commented 4 months ago

Thanks a lot! The loading time for data is indeed causing much time. Thanks for your continuous optimization of the training code!