[Question] about converting fairseq to sentencepiece

phamkhactu commented 10 months ago

Check before submitting issues

[X] Make sure to pull the latest code, as some issues and bugs have been fixed.
[X] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
[X] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
[X] Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
[X] Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance

Type of Issue

Model conversion and merging

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

from your detail guide for merging Token. I deal with expanding my token llama2.

First of all, I have a fairseq model contains:

- bpe.codes
- dict.txt
- model.pt

I've searched on the internet, but I not found anythings to load sentencepiece model from pretrained fairseq

I am very happy if you can give me some code producer for merging token from fairseq pretrained weights.

Thank you very much!

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

airaria commented 10 months ago

We worked with Hugging Face format models and tokenizers. If you have fairseq pretrained weights, you have to convert them to the hf format first, or you can find the Hugging Face format models and tokenizers here: https://huggingface.co/meta-llama/Llama-2-7b-hf

phamkhactu commented 10 months ago

Hi @airaria

yeah, thank for your sharing, I am so sorry for making you confuse.

I have llama2-hf model and fairseq bart model(another token to merge), as your example in line 17 to line 18, I know that I must convert to "fairseq" to "sentencepiece" to expand token.

Could you give me solution?

ymcui / Chinese-LLaMA-Alpaca