ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki
Apache License 2.0
17.98k stars 1.84k forks source link

How to recreate the sentence piece model? #846

Closed abhinand5 closed 9 months ago

abhinand5 commented 10 months ago

Check before submitting issues

Type of Issue

Other issues

Base Model

None

Operating System

Linux

Describe your issue in detail

Excellent work by the community to open-source this project and it serves as a guide for many people like me who want to fine-tune LLAMA2 on our own languages.

I've looked into all the code but couldn't find the code or spm_train command that you used to train the chinese_sp.model.

Information such as these would help greatly:

Example command I am using:

spm_train --input=tamil_sentence_corpus_1.6m.txt \
    --model_prefix=tamil_sp \
    --vocab_size=16000 \
    --character_coverage=1.0 \
    --model_type=unigram

Dependencies (must be provided for code-related issues)

# Please copy-and-paste your dependencies here.

Execution logs or screenshots

# Please copy-and-paste your logs here.
github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 9 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.