Open daxida opened 5 months ago
It's described by a schema in yomitan: https://github.com/themoeway/yomitan/blob/master/ext/data/schemas/dictionary-term-bank-v3-schema.json Some of the fields are pretty much obsolete I'd say. There's other schemas in that folder for the IPA, index.json etc.
Just to make sure you're not doing more than you need, this is converting something other than kaikki, i.e. this is separate from https://github.com/tatuylonen/wiktextract/discussions/651?
Thank you for the link.
I'm unfortunately still having trouble parsing that schema. Is it obvious from that what maps to what in the lines that I previously commented?
And thank you for your concern: this is a separate matter. I didn't mention it before because I was afraid to be instantly dismissed for being out of topic. I'm toying with the idea of making a Yomitan-compatible dictionary like yours from a website called lingq.
Their entries are very simple in comparison to that schema:
{
"pk": 459243703,
"url": "https://www.lingq.com/api/v3/el/cards/459243703/",
"term": "εκφώνησής",
"fragment": "διαδικασία προγραμματισμού της εκφώνησής σας, να",
"importance": 0,
"status": 0,
"extended_status": null,
"last_reviewed_correct": null,
"srs_due_date": "2023-09-12T08:26:23.907721",
"notes": "",
"audio": null,
"words": [
"εκφώνησής"
],
"tags": [],
"hints": [
{
"id": 129173102,
"locale": "en",
"text": "of reading (aloud)",
"term": "εκφώνησής",
"popularity": 2,
"is_google_translate": true,
"flagged": false
}
],
"transliteration": {
"latin": [
"ekfonisis"
]
},
"gTags": [],
"wordTags": [],
"readings": {
},
"writings": [
"εκφώνησής",
"εκφωνησης"
]
}
There are some things like fragments (sort of "example sentence") that I'm still not sure where to put.
Here's some more details:
[
"居住者",
"きょじゅうしゃ",
"n",
"",
604,
[
"resident",
"inhabitant"
],
1717870,
"P news"
]
tag_bank_1.json
. They can be about the part of speech, but also usage qualifiers (rare, archaic, vulgar...), field (law, biology, astronomy...), region (British/American and such). When you click on them, the full tag name is shown.conditions
defined in the "transforms" (aka deinflections) file for that language, see english-transforms.js , and help deinflection be more precise. If a language has no deinflection yet, they are unnecessary.{
"type": "integer",
"description": "Sequence number for the term. Terms with the same sequence number can be shown together when the \"resultOutputMode\" option is set to \"merge\"."
},
idk really, probably safe to ignore, just set it to 0
I'm sorry if this is not the right place to ask.
I recently found this repository via wiktextract and I would like to do something similar for another API. I was browsing for a while but I could not find a description of the JSON entries that are used, like this one here.
I understand that the expected dictionary from Yomitan is something of the likes of:
Could you give me some headers? I've also tried the Yomitan repo but I could not find much information about it. Maybe it's a standard dictionary format that I'm unaware of?