xing0047 / cca-llava

[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
Apache License 2.0
42 stars 1 forks source link

SFT issue #4

Open ErikZ719 opened 6 days ago

ErikZ719 commented 6 days ago

Hi, I follow the default settings (pyproject.toml) to do fine tuning experiments on 3X3090 and it reports out of memory, is this normal? Pre-training works fine. `Package Version Editable project location


absl-py 2.1.0 accelerate 0.26.1 aiofiles 23.2.1 altair 5.4.1 annotated-types 0.7.0 anyio 4.6.2.post1 attrs 24.2.0 bitsandbytes 0.44.1 certifi 2022.12.7 charset-normalizer 2.1.1 click 8.1.7 contourpy 1.3.0 cycler 0.12.1 deepspeed 0.13.1 docker-pycreds 0.4.0 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.2.2 fastapi 0.115.4 ffmpy 0.4.0 filelock 3.13.1 flash-attn 2.5.8 fonttools 4.54.1 fsspec 2024.2.0 gitdb 4.0.11 GitPython 3.1.43 gradio 4.16.0 gradio_client 0.8.1 grpcio 1.67.1 h11 0.14.0 hjson 3.1.0 httpcore 0.17.3 httpx 0.24.0 huggingface-hub 0.26.2 idna 3.4 importlib_resources 6.4.5 Jinja2 3.1.3 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kiwisolver 1.4.7 latex2mathml 3.77.0 llava 1.2.2.post1 /root/zqy/cca-llava Markdown 3.7 markdown-it-py 3.0.0 markdown2 2.5.1 MarkupSafe 2.1.5 matplotlib 3.9.2 mdurl 0.1.2 mpmath 1.3.0 narwhals 1.12.1 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.3 orjson 3.10.11 packaging 24.1 pandas 2.2.3 peft 0.13.2 pillow 10.2.0 pip 24.3.1 platformdirs 4.3.6 protobuf 5.28.3 psutil 6.1.0 py-cpuinfo 9.0.0 pydantic 2.9.2 pydantic_core 2.23.4 pydub 0.25.1 Pygments 2.18.0 pynvml 11.5.0 pyparsing 3.2.0 python-dateutil 2.9.0.post0 python-multipart 0.0.17 pytz 2024.2 PyYAML 6.0.2 referencing 0.35.1 regex 2024.9.11 requests 2.28.1 rich 13.9.4 rpds-py 0.20.1 ruff 0.7.2 safetensors 0.4.5 scikit-learn 1.2.2 scipy 1.14.1 semantic-version 2.10.0 sentencepiece 0.1.99 sentry-sdk 2.17.0 setproctitle 1.3.3 setuptools 75.1.0 shellingham 1.5.4 shortuuid 1.0.13 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 starlette 0.41.2 svgwrite 1.4.3 sympy 1.13.1 tensorboard 2.18.0 tensorboard-data-server 0.7.2 threadpoolctl 3.5.0 timm 0.6.13 tokenizers 0.15.2 tomlkit 0.12.0 torch 2.1.1+cu121 torchaudio 2.1.1+cu121 torchvision 0.16.1+cu121 tqdm 4.66.6 transformers 4.37.2 triton 2.1.0 typer 0.12.5 typing_extensions 4.12.2 tzdata 2024.2 urllib3 1.26.13 uvicorn 0.32.0 wandb 0.18.5 wavedrom 2.0.3.post3 websockets 11.0.3 Werkzeug 3.1.1 wheel 0.44.0 xformers 0.0.23 `

xing0047 commented 6 days ago

Hi @ErikZ719

Thanks for your feedback.

As LLaVA official repo states, if you do not have enough gpu memory for LLaVA training, please consider,

1) Use LoRA: finetune_lora.sh. As LLaVA indicates, 7B training can be fitted in 8-RTX3090 (I runned lora training on 4090 and seems work well). Make sure per_device_train_batch_size*gradient_accumulation_steps is the same as the provided script for best reproducibility.

2) Replace zero3.json with zero3_offload.json which offloads some parameters to CPU RAM. This slows down the training speed.

ErikZ719 commented 6 days ago

Thank you very much for your reply. Would it be possible to show your cuda version and “pip list” on the 4090? I'm getting a version conflict error with my lora training.

xing0047 commented 6 days ago

Hi @ErikZ719

Please check below for your reference.

ErikZ719 commented 6 days ago

What can i say! Man,thank you very much. : )