ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki
Apache License 2.0
17.98k stars 1.84k forks source link

[Question] about converting fairseq to sentencepiece #844

Closed phamkhactu closed 10 months ago

phamkhactu commented 10 months ago

Check before submitting issues

Type of Issue

Model conversion and merging

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

from your detail guide for merging Token. I deal with expanding my token llama2.

First of all, I have a fairseq model contains:

- bpe.codes
- dict.txt
- model.pt

I've searched on the internet, but I not found anythings to load sentencepiece model from pretrained fairseq

I am very happy if you can give me some code producer for merging token from fairseq pretrained weights.

Thank you very much!

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

airaria commented 10 months ago

We worked with Hugging Face format models and tokenizers. If you have fairseq pretrained weights, you have to convert them to the hf format first, or you can find the Hugging Face format models and tokenizers here: https://huggingface.co/meta-llama/Llama-2-7b-hf

phamkhactu commented 10 months ago

Hi @airaria

yeah, thank for your sharing, I am so sorry for making you confuse.

I have llama2-hf model and fairseq bart model(another token to merge), as your example in line 17 to line 18, I know that I must convert to "fairseq" to "sentencepiece" to expand token.

Could you give me solution?