haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.91k stars 2.07k forks source link

[Question] Is the LLaVA-1.6 training/fine-tuning code ready? #1270

Open homiec opened 5 months ago

homiec commented 5 months ago

Question

when i load the “llava-v1.5-7b”, the training process is ok but when i load the “llava-v1.6-vicuna-7b” the error is Traceback (most recent call last): File "/home/ma-user/work/chidafeng/Embodied_AI_Agent/llava/train/train_xformers.py", line 13, in train() File "/home/ma-user/work/chidafeng/EmbodiedAgent/llava/train/train.py", line 970, in train trainer.train() File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop(
trainer.train() File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1198, in prepare result = self._prepare_deepspeed(*args) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1537, in _preparedeepspeed model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) engine, optimizer, , lrscheduler = deepspeed.initialize(**kwargs) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/init.py", line 171, in initialize engine = DeepSpeedEngine(args=args,
engine, optimizer,
, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 304, in init File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1234, in _configure_optimizer self._configure_optimizer(optimizer, model_parameters)
self.optimizer = self._configure_zero_optimizer(basic_optimizer) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1563, in _configure_zero_optimizer self.optimizer = self._configure_zero_optimizer(basic_optimizer) optimizer = DeepSpeedZeroOptimizer_Stage3( File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 314, in init self._create_fp16_partitions_with_defragmentation(self.trainable_param_groups)
optimizer = DeepSpeedZeroOptimizer_Stage3( File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation device_buffer = class.defragment(parameter_partitions) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation

File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 522, in defragment assert len(set(t.dtype for t in tensors)) == 1
device_buffer = class.defragment(parameter_partitions) AssertionError

pengwangucla commented 5 months ago

+1

joaomsimoes commented 5 months ago

+1

yinincanada commented 5 months ago

+1

linkboyx commented 5 months ago

+1

LoFiApostasy commented 5 months ago

∞ +1

markmywords-tech commented 5 months ago

+1

drogozhang commented 5 months ago

+1

jsm69 commented 4 months ago

did anyone get something?

PzWHU commented 3 months ago

+1

NicoZenith commented 2 months ago

any update?