About vocabulary extension

alielfilali01 commented 5 months ago

Check before submitting issues

[X] Make sure to pull the latest code, as some issues and bugs have been fixed.
[X] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
[X] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
[X] Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
[X] Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance

Type of Issue

Other issues

Base Model

None

Operating System

None

Describe your issue in detail

I'am sorry but this more like a question to the team behind this impressive paper, rather than an issue

First thing first, thank you so much for these efforts 🙏🏻, we are working on quite the same thing for Arabic, and would love to see how you guys managed to extend the vocabulary of the original tokenizer without training the tokenizer from scratch again ?

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 4 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

ymcui / Chinese-LLaMA-Alpaca