Open puyuanliu opened 1 year ago
Hello my friend, like finding treasures in this issue. I had a QQ chat group. Are u willing to come in and help all Chinese friends. My QQ chat group number is: 397447632
Regarding the CUDA OOM during model saving, with python 3.10: we should make the change in python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py
1. module transformers has no attribute LLaMATokenizer or 'missing key 'llama'.
First, install the SentencePiece then install transformers from huggingface git repo. i.e., pip install sentencepiece, pip install git+https://github.com/huggingface/transformers.git The installation order matters.
2. CUDA OOM at the beginning of the training.
Use -fp 16 instead of -bp 16. Lower the batch size and gradient accumulation steps.
3. CUDA OOM during model saving.
Assume you are using torch=1.13.0, change python/lib/python3.9/site packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224 from state_dict[fqn] = state_dict[fqn].clone().detach() to state_dict[fqn] = state_dict[fqn].cpu().clone().detach()
This usually happens when using GPUs of small memory (e.g., 40GB or 24GB)
4. How to perform inference?
Refer to https://github.com/tatsu-lab/stanford_alpaca/issues/35#issuecomment-1470985081
5. Generated tokens are not human-readable at inference time.
Assume your training goes well (e.g., training loss <0.5), it's most likely your model weights are corrupted during model saving. Make sure there is no error message during the saving.
6. Finetuning is slow.
Refer to https://github.com/tatsu-lab/stanford_alpaca/issues/32#issuecomment-1474203699