ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7.04k stars 581 forks source link

Forgetting English During Chinese LLM Training #442

Closed Abolfazl-kr closed 7 months ago

Abolfazl-kr commented 9 months ago

Check before submitting issues

Type of Issue

Performance issue

Base Model

Chinese-LLaMA-2 (7B/13B)

Operating System

Linux

Describe your issue in detail

First of all, I would like to express my gratitude for the amazing work you and your team have done in developing LLMs. I have been using Llama2 for training models in different language, and I have noticed that the model doesn't work well in English after training. after training, it cannot produce English answers. I also checked the Chinese model, and it didn't answer in English either, even when I asked my question in English.

I was wondering if you have checked the forgetting in English and published the results, and if this forgetting was done on purpose. Also I wanna know Is there anything we can do to avoid forgetting? I would appreciate any insights or suggestions you may have on this matter.

Thank you again for your hard work and dedication to advancing the field of language modeling. Best regards,

Dependencies (must be provided for code-related issues)

# Please copy-and-paste your dependencies here.

Execution logs or screenshots



torchrun --nnodes 1 --nproc_per_node 4 run_clm_pt_with_peft.py \
    --deepspeed ds_zero2_no_offload.json \
    --model_name_or_path /home/hadoop/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ \
    --tokenizer_name_or_path /home/hadoop/abolfazl/Chinese-LLaMA-Alpaca-2/scripts/tokenizer/merged_tokenizer_hf \
    --dataset_dir /home/hadoop/abolfazl/parvin2 \
    --data_cache_dir /home/hadoop/abolfazl/Chinese-LLaMA-Alpaca-2/scripts/training/cache \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size 8 \
    --do_train \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --lr_scheduler_type cosine \
    --learning_rate 2e-4 \
    --warmup_ratio 0.001 \
    --weight_decay 0.001 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 1000 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 8 \
    --block_size 128 \
    --output_dir /home/hadoop/abolfazl/Chinese-LLaMA-Alpaca-2/out_pt_secondtry \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank 64 \
    --lora_alpha 16 \
    --trainable "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" \
    --lora_dropout 0.05 \
    --modules_to_save "embed_tokens,lm_head" \
    --torch_dtype float16 \
    --load_in_kbits 4 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False```
ymcui commented 9 months ago
  1. We did not force our model to forget English knowledge. The forgetting issue is well-known in the context of pretraining-finetuning and language adaptation, when you try to use different type/language training data that is different from the original model.
  2. Regarding the issue that the model only respond with Chinese, I recommend 1) clearly specify your request in the system prompt, such as "always respond in English", though this is not always useful. 2) start your first instruction/query in English.

However, keep in mind that these will NOT gurantee that the response will be always in English, as the instruction-following ability of the model also matters.

Abolfazl-kr commented 9 months ago

Thanks for your instant response.

do you know any method to decrease the forgetting? for example, some researcher continued pre-training models (on Portuguese) that claim their forgetting is negligible.

ymcui commented 9 months ago

Thanks for your instant response.

do you know any method to decrease the forgetting? for example, some researcher continued pre-training models (on Portuguese) that claim their forgetting is negligible.

There is no consensus on this topic. Some common practice would be adding proportional training data in original language. This might help slow down the speed of catastrophic forgetting problem. But the balance of the training data in original/target language should be analyzed case-by-case.

noobmaster29 commented 9 months ago

This research paper might be of interest:

https://arxiv.org/pdf/2308.04014.pdf

Abolfazl-kr commented 9 months ago

@noobmaster29

I will check it. Thank you !

Abolfazl-kr commented 9 months ago

actually I have another question. @ymcui

Is this any way to transfer information from English to another language? or did Chinese Llama2 do this?

because when i continue pre-trained the model it seems that it answer just on the training text. it would be great if we could use the information that llama know from English into the other languages.

noobmaster29 commented 9 months ago

I think some examples would be good. It is not clear what you mean by "seems that it answer just on the training text".

I think LLMs have some ability in multiple languages but the majority of base Llama2 was trained on English so that is where it is going to be strongest. Chinese Llama2 seems to have been pretrained on additional Chinese text but it still pales in comparison to its original English training set. If you just want translation, there are models that are better suited for that purpose I think.

Abolfazl-kr commented 9 months ago

@noobmaster29

What I mentioned earlier is an example where a model trained on medical text can only answer in medical text and forget everything else, except for medical information. It cannot even answer simple questions like the color of the sky.

What i want is transfer learning; For example, I train a model like Llama2 on some Persian text, it include information about pizza, but it won't mention any details about the ingredients of pizza. In English, Llama2 knows about the ingredients of pizza, and I want to ask about the ingredients of pizza in Persian. The model can transfer information from English to Persian.

Abolfazl-kr commented 9 months ago

Actually, while pre-training my model crashed (because of decrease loss-scale to the minimum loss-scale). it train on just about 70Mb of data for just 4000 steps. after that i use merge_llama2_with_chinese_lora_low_mem.py and make the model but the model can create just about 5 or 6 vague sentence.

the training steps was very short but the model change alot and cannot create anything in English.

Abolfazl-kr commented 8 months ago

No one help :(

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.