songmzhang / DSKD

Repo for Paper "Dual-Space Knowledge Distillation for Large Language Models".
29 stars 3 forks source link

Getting an error when trying to perform SFT on Tiny Llama #8

Closed survivebycoding closed 1 month ago

survivebycoding commented 1 month ago

We are getting this error when trying to execute SFT on TinyLlama: [rank0]: if f.read(7) == "version": [rank0]: File "/usr/lib/python3.10/codecs.py", line 322, in decode [rank0]: (result, consumed) = self._buffer_decode(data, self.errors, final) [rank0]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 70: invalid start byte

However, we found no issue executing llama and mistral. If you have any idea regarding this issue, please let us know

songmzhang commented 1 month ago

It is due to the version of transformers library. We also found this issue in our early experiments. The version suggested by TinyLLaMA is 4.31, while we loaded it with transformers == 4.38 and also found this error.

We conjecture that it may be because some weights of LlamaForCausalLM were not initialized from the model checkpoint.

Here is our solution:

  1. Create another environment with transformers == 4.31
  2. Load the model checkpoint and re-save it into another directory (noted as tinyllama_new), i.e., model = AutoModelForCausalLM.from_pretrained(original_tinyllama_path) and model.save_pretrained(tinyllama_new, safe_serialization=False).
  3. Switch to the original environment for this project and load the model checkpoint from tinyllama_new.
songmzhang commented 1 month ago

Thanks for reminding us of this issue and we will add this to README.md.

survivebycoding commented 1 month ago

original_tinyllama_path - is this the path in our system where we have downloaded tiny llama?

songmzhang commented 1 month ago

original_tinyllama_path - is this the path in our system where we have downloaded tiny llama?

Yes.

survivebycoding commented 1 month ago

image The new llama folder created does not have the same files.

Getting the error" [OSError: Can't load tokenizer for '/tinyllama/tinyllama-1.1b-3T'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/tinyllama/tinyllama-1.1b-3T' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.]

songmzhang commented 1 month ago

image The new llama folder created does not have the same files.

Getting the error" [OSError: Can't load tokenizer for '/tinyllama/tinyllama-1.1b-3T'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/tinyllama/tinyllama-1.1b-3T' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.]

You need to copy tokenizer files (i.e., special_tokens_map.json, tokenizer_config.json, tokenizer.json, tokenizer.model) to the new directory to load the tokenizer.

survivebycoding commented 1 month ago

Full finetuning eval scripts image are under gpt2 folder of scripts and for lora its under tiny llama scripts? Or is lora only applicable to tiny llama?

This part is a bit confusing, maybe you can clarify a bit what ckpt_path and lora_adapter_path is.... also, what is usually given in eval_batch_size?

survivebycoding commented 1 month ago

In vanilla_KD_tinyllama [TEACHER_PEFT_PATH="path_to_teacher_sft_lora_ckpt"] is the path to the folder called epoch9 right? image

songmzhang commented 1 month ago

Full finetuning eval scripts image are under gpt2 folder of scripts and for lora its under tiny llama scripts? Or is lora only applicable to tiny llama?

This part is a bit confusing, maybe you can clarify a bit what ckpt_path and lora_adapter_path is.... also, what is usually given in eval_batch_size?

No, run_eval_lora.sh is not only applicable to TinyLLaMA. The scripts and README.md are just for re-implementing the experiments in our paper. Actually, you can also evaluate other fully fine-tuned checkpoints besides GPT2 with run_eval.sh.

Here, CKPT_PATH means the path of the full fine-tuned checkpoint. For example, in your case, it is the path of "epoch9_step...".

Similarly, LORA_ADAPTER_PATH is the path of the LoRA adapter, whose name has the same format as the full fine-tuned ckpt like "epoch9_step...".

For EVAL_BATCH_SIZE, it depends on the available GPU memory and the amount of the model parameters. For example, we use 32 for GPT2-base and 8 for TinyLLaMA.

songmzhang commented 1 month ago

In vanilla_KD_tinyllama [TEACHER_PEFT_PATH="path_to_teacher_sft_lora_ckpt"] is the path to the folder called epoch9 right? image

Yes.