Would it be possible to include more information about the root word for conjugated words in the JSON from ichiran-cli -f? For my use case, ideally the text, kana and seq fields from the root word would be included. For example
I'm looking to generate anki cards from sentences, using ichiran to detect the individual words in the sentence. Currently it seems tricky to programmatically determine that みてみる is really the same word twice, for example:
[
[
[
[
[
"mite",
{
"reading": "みて",
"text": "みて",
"kana": "みて",
"score": 40,
"seq": 10591144,
"conj": [
{
"prop": [
{
"pos": "v1",
"type": "Conjunctive (~te)"
}
],
"reading": "見る 【みる】",
"gloss": [
{
"pos": "[v1,vt]",
"gloss": "to see; to look; to watch; to view; to observe"
},
{
"pos": "[v1,vt]",
"gloss": "to examine; to look over; to assess; to check; to judge"
},
{
"pos": "[v1,vt]",
"gloss": "to look after; to attend to; to take care of; to keep an eye on"
},
{
"pos": "[v1,vt]",
"gloss": "to experience; to meet with (misfortune, success, etc.)"
},
{
"pos": "[aux-v,v1]",
"gloss": "to try ...; to have a go at ...; to give ... a try",
"info": "after the -te form of a verb"
},
{
"pos": "[aux-v,v1]",
"gloss": "to see (that) ...; to find (that) ...",
"info": "as 〜てみると, 〜てみたら, 〜てみれば, etc."
}
],
"readok": true
}
]
},
[]
],
[
"miru",
{
"reading": "みる",
"text": "みる",
"kana": "みる",
"score": 40,
"seq": 1259290,
"gloss": [
{
"pos": "[v1,vt]",
"gloss": "to see; to look; to watch; to view; to observe"
},
{
"pos": "[v1,vt]",
"gloss": "to examine; to look over; to assess; to check; to judge"
},
{
"pos": "[v1,vt]",
"gloss": "to look after; to attend to; to take care of; to keep an eye on"
},
{
"pos": "[v1,vt]",
"gloss": "to experience; to meet with (misfortune, success, etc.)"
},
{
"pos": "[aux-v,v1]",
"gloss": "to try ...; to have a go at ...; to give ... a try",
"info": "after the -te form of a verb"
},
{
"pos": "[aux-v,v1]",
"gloss": "to see (that) ...; to find (that) ...",
"info": "as 〜てみると, 〜てみたら, 〜てみれば, etc."
}
],
"conj": []
},
[]
]
],
80
]
]
]
Hi, Thanks for your work on ichiran.
Would it be possible to include more information about the root word for conjugated words in the JSON from
ichiran-cli -f
? For my use case, ideally thetext
,kana
andseq
fields from the root word would be included. For example見て => 見る, みる, 1259290 観て => 観る, みる, 1259290 みて => みる, みる, 1259290
I'm looking to generate anki cards from sentences, using ichiran to detect the individual words in the sentence. Currently it seems tricky to programmatically determine that
みてみる
is really the same word twice, for example: