undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit
http://undertheseanlp.com
GNU General Public License v3.0
1.43k stars 274 forks source link

Open Vietnamese Dictionary version 2023 #622

Open rain1024 opened 1 year ago

rain1024 commented 1 year ago

πŸ‘©β€πŸ’Ό As an NLP engineer, you know that accurate and reliable language resources are essential for building effective natural language processing systems. πŸ“š The Vietnamese Dictionary project is a comprehensive and reliable resource for learners and users of the Vietnamese language that is specifically designed to meet the needs of NLP engineers. With over 10,000 words and phrases, this dictionary provides clear and accurate definitions and example sentences to help you build high-quality language models and applications. In addition to its core function as a reference tool, the Vietnamese Dictionary also includes helpful features such as pronunciation guides, synonyms, antonyms, and etymology.

🌍 As an open data project, the Vietnamese Dictionary is freely available to the public to use and reuse, typically with minimal restrictions. By making the dictionary available as open data, we hope to enable NLP engineers like you to access and use the data in a variety of ways, such as by incorporating it into your own projects or tools, or by building upon it to create new resources.

Planning

  1. Determine the language(s) and audience for the dictionary:
    • [ ] Research the language(s) and the needs and interests of the intended audience
    • [ ] Decide on the focus and scope of the dictionary (e.g. general usage, technical terms, slang, etc.)
  2. Gather a list of words to include in the dictionary:
    • [ ] Consult other dictionaries and language resources
    • [ ] Identify commonly used words and specialized terms
    • [ ] Consider the needs and interests of the intended audience
  3. Research and verify the definitions and translations for the words:
    • [ ] Consult with linguists or language experts
    • [ ] Use online resources such as language forums and online dictionaries
    • [ ] Verify definitions and translations with multiple sources
  4. Determine the format and structure for the dictionary entries:
    • [ ] Consult with other dictionaries and language resources to see how they structure their entries
    • [ ] Decide which elements to include in the dictionary (e.g. pronunciation, part of speech, definition, example sentence, synonyms, antonyms, etymology)
    • [ ] Consider the needs and interests of the intended audience
  5. Input the words and their definitions or translations into the dictionary:
    • [ ] Use a text editor, spreadsheet program, or specialized dictionary software
    • [ ] Create columns or fields for different elements of the dictionary entry (e.g. word, pronunciation, part of speech, definition, example sentence)
  6. Review and edit the entries:
    • [ ] Check for accuracy and consistency with the chosen format and structure
    • [ ] Have multiple people review the entries to catch any mistakes or inconsistencies
  7. Test the dictionary:
    • [ ] Use the dictionary to look up words and check their definitions or translations
    • [ ] Have other people test the dictionary to see if it is easy to use and the definitions and translations are clear and accurate
  8. Publish the dictionary:
    • [ ] Choose a suitable format (e.g. printed book, online database, mobile app)
    • [ ] Consider layout, design, formatting, and any legal or copyright issues
rain1024 commented 1 year ago

Dictionary Features

rain1024 commented 1 year ago

[Update 2023/02/23] I just publish list of Vietnamese words in huggingface via https://huggingface.co/datasets/undertheseanlp/UTS_Dictionary