xicri / genshin-langdata

English-Chinese-Japanese translation dataset of the terms in Genshin Impact
https://genshin-dictionary.com
Other
27 stars 13 forks source link

Update dictionary JSON format (v1) #40

Open xicri opened 2 years ago

xicri commented 2 years ago

Convert the dictionary JSON to the following spec:

{
    slug: "inazuman", // auto generated, does not exist in source JSON
    en: {
      word: "Inazuman",
      kana: "イナズマン", // English pronunciation expressed in Hiragana/Katakana. Optional.
      variants: [ "...", "...", ... ], // Optional. `variants` will be shown in the UI. For words you don't want to show, move to `hints`.
      note: "Originally, Inazuman was an unofficial...", // notes (plural) → note (singular), since it is not an array 
    },
    ja: {
      word: [ "稲妻人", "稲妻の" ], // use array instead of splitting with slash (/)
      kana: [ "いなずまじん", "いなずまの" ], // Required if the word includes any Kanji. If kana is unknown, set `null`.
      kanaConfirmed: false, // Default: true. Set false if I have not confirmed the official pronunciation in the official resources.
      variants: [ "...", "...", ... ],
      note: "元々 Inazuman は非公式に…(以下略)",
    },
    "zh-CN": { // add hyphen to avoid incompatibility with Next's locale ID (Currently I have to convert `zh-CN` to `zhCN`.)
      word: [ "稻妻人", "稻妻的" ],
      pinyins: [{ char: "稻", pron: "dao4" }], // Chinese pronunciation expressed in Pinyin. Optional.
      variants: [ "...", "...", ... ],
      note: "...",
    },

    hints: [ "..." ], // Search hints such as typo (e.g. 鐘離 for 鍾離) and short name (e.g. Mona for Astrologist Mona Megistus ― used for detecting exact match to optimize results order)
    examples: [{
      en: "Inazumans are definitely more particular about etiquette than Mondstadters!",
      ja: "モンド人よりも、稲妻人の方が礼儀に対して気を配っている。",
      "zh-CN": "...", // Optional
      refs: [
        {
          title: "English title for the video",
          url: "https://youtube.com/watch?...",
        },
        {
          title: "トーマ, キャラクター実戦紹介 トーマ「烈炎の守護」",
          url: "https://www.youtube.com/watch?v=jvmz4TrPgUE&t=16s",
        },
      ],
    }],

    tags: [ "inazuma" ], // Optional
    related: [ "kamisato-ayaka", ... ], // Add slug here to show the links to related words. Optional
    createdOn: "2022-01-01", // auto generated, does not exist in source JSON; At → On
    updatedOn: "2022-01-01", // auto generated, does not exist in source JSON; At → On
  },

Conversion to TypeScript

I'm considering migrating from JSON5 to TypeScript for better code suggestions and error checks by IDEs. Discuss with Bill and BLACKALiCE before the decision.

By converting to TypeScript, the data format will be slightly changed, but mostly the same as JSON. The code looks like this:

JSON5:

[
  {
    slug: "inazuman",
    en: {
      word: "Inazuman",
      kana: "イナズマン", 
      // ...
    },
  },
  // ...
]

TypeScript:

import { defineWords } from "../../libs/types";

export default defineWords([
  {
    slug: "inazuman",
    en: {
      word: "Inazuman",
      kana: "イナズマン", 
      // ...
    },
  },
  // ...
]);

The difference is that only the array is wrapped with the defineWords() function for better type checking.

Pros: Developer experience ― If you use an editor supporting TypeScript like VSCode, maybe you can write code more smoothly. For example, if you mistype a property name (e.g. zhCn instead of zhCN), you will see a red underline under the property name. Also, when you type some first characters of the property name, the editor suggests the property name to complete by pressing the Enter key.

Cons: If you are not familiar with TypeScript, you might feel uncomfortable to write it.

Bill-Haku commented 1 year ago

Hello, I am wondering when can this update be done. I have noticed the plan of adding pinyin the issue #114, and I also think it would be a lot helpful for users to add notes in English and Chinese, as they are mostly written in Japanese now.

xicri commented 1 year ago

@Bill-Haku I'm currently working on the migration from Nuxt/Vue.js to Next.js/React, and I plan to work on this issue after I have finished the Next.js migration. Unfortunately, I think I need three weeks at minimum.

Since it takes too long, I'm considering adding pinyin and notesCN notesZh fields before working on this issue as a workaround. I think I can implement this in a week.

  {
    "id": "zhongli",
    "en": "Zhongli",
    "ja": "鍾離",
    "zhCN": "钟离",
    "pronunciationJa": "しょうり",
    "pinyin": "zhōng lí", // New
    "notes": "読みは「ヂョンリー」",
    "notesZh": "...", // New
    // ...
  },
Bill-Haku commented 1 year ago

Thanks for your work. BTW you can use a script to convert the existed json format to the new json format right?

xicri commented 1 year ago

@Bill-Haku

BTW you can use a script to convert the existed json format to the new json format right?

Yes, I will write a small script to convert the existing JSON to the new format. There is already the WIP repository: https://github.com/xicri/langdata-converter-v1

xicri commented 1 year ago

@Bill-Haku @SleepyAsh0191 I'm starting to update the JSON5 format of genshin-langdata to refactor and add new features such as English notes and related words. If you have any input, let me know. (I might make some minor updates for the new format later.)

Also, I'm considering conversion from JSON5 to TypeScript as I noted in the description of this issue. Can you let me know if you agree or disagree with conversion to TypeScript?

Bill-Haku commented 1 year ago

The new json format looks good, but please make a new API URI for the new format and keep the old format for a while so API users including me can update it and the users of the existed version will not be affected.

Also, I think using TypeScript is a good choice. I have no problem with this conversion.

xicri commented 1 year ago

@Bill-Haku

The new json format looks good, Also, I think using TypeScript is a good choice. I have no problem with this conversion.

Thanks!

but please make a new API URI for the new format and keep the old format for a while so API users including me can update it and the users of the existed version will not be affected.

Yes, of course. I will make a new API URL (maybe something like https://dataset.genshin-dictionary.com/v1/words.json) for the new format. I might update the URL for the old format (maybe something like https://dataset.genshin-dictionary.com/v0/words.json), but even if I update the URL, I will keep the old URL for one or some months after I release the new format to avoid sudden breaking changes for the developers including you.

Bill-Haku commented 1 year ago

Thank you. Looking forward to the update.