Closed Naozumi520 closed 1 year ago
I've tried using both model name and the path of the local folder, still no luck.
Can you list the files in Naozumi/llama2-qlora-finetunined-cantonese
?
It looks like the problem is related to the tokenizer.
Cannot find lora model on the disk. Downloading lora model from hub...
This tells you that the lora model Naozumi/llama2-qlora-finetunined-cantonese
cannot be found.
You should specify either a local folder or a huggingface identifier for --lora_model
.
In this context, you need to check what files are there in Naozumi/llama2-qlora-finetunined-cantonese
folder.
README.md adapter_config.json adapter_model.bin gitattributes.txt special_tokens_map.json tokenizer.json tokenizer_config.json
These are the files in my lora folder.
Maybe it was the problem of the Colab notebook. I was fine tuned using the Llama_2_Fine_Tuning_using_QLora-2.ipynb
I could search on web. But it doesn't seemed to be saving the tokenizer so I added a line tokenizer.push_to_hub("my-awesome-model")
. If possible, can I have any Colab notebook that tokenizer can be created that is recommended? Fine-tune on my pc is not possible in this moment.
The merging script was intended for the models in this project, and thus we did not test the compatiblity of other models.
You might need to save your tokenizer into tokenizer.model
first, and try to merge your model again. Also check adapter_model.bin
is with a proper file size (at least several MB).
Refer to our Chinese-LLaMA-LoRA-7B: https://huggingface.co/ziqingyang/chinese-llama-lora-7b/tree/main and check what file was missing.
I've checked the Llama_2_Fine_Tuning_using_QLora-2.ipynb
once again and it seems this notebook I used takes the tokenizer from the base model. This explained why the generated file tokenizer.json
didn't contain any Chinese characters.
The fact is what I'm trying to do is to fine-tune the model with my Cantonese datasets in JSON format (instruction, input and output). But I couldn't find one online notebook with capability to generate a tokenizer. What am I supposed to do?
If you are going to do further instruction fine-tuning on Llama-2 series, refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_en
I'm at the step setting value chinese_tokenizer_path
in run_sft.sh
. Do I have to use the tokenizer in this repo? It's not a problem tho, I'm just worrying the language branches I'm fine tuning in Cantonese.
I'm at the step setting value
chinese_tokenizer_path
inrun_sft.sh
. Do I have to use the tokenizer in this repo? It's not a problem tho, I'm just worrying the language branches I'm fine tuning in Cantonese.
You must use the corresponding tokenizer released together with the model weights in order to fine-tune the model correctly.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
Check before submitting issues
Type of Issue
Model conversion and merging
Base Model
LLaMA-7B
Operating System
macOS
Describe your issue in detail
With following the instruction in here, I got the error
TypeError: not a string
from sentencepiece.Dependencies (must be provided for code-related issues)
No response
Execution logs or screenshots