[ ] python konlpy cli driver that accepts a string and returns the token POS tags
[ ] quizgen accepts source text language opt
[ ] if source text language is Korean
[ ] when building each sentence, pass the sentence text to the konlpy driver
[ ] parse the POS tags and assign them to Word instances
[ ] In Word, a new member root_string has the subset of key_string that excludes unimportant parts of speech (particles/ornaments). Another member stores the ornaments.
[ ] What to do with word root_string and ornaments?
I don't think words should use root_string as the unique identifier, because a word test should include the particles as part of the test; they are sometimes what makes an answer invalid or valid, depending on the correct overall part of speech.
I could use root_string to count occurrences of a word (across multiple instances of Word with different key_string), which could be used to limit the number of tests of the same word. Likewise, it could be used to enable tests of words otherwise considered to occur too infrequently.
I could use root_string for edit distance, so that words with different particles and the same root would be stored as edit distance zero. This could then be used to exclude words with the same root from choices for a test (ex. 가방은, 가방이). But again, sometimes testing different parts of speech for the same root is desirable for testing grammar instead of vocabulary.
I plan to use the konlpy Python package, with a driver script that quizgen can call to fetch part of speech (POS) tags for a given sentence. https://github.com/ogallagher/quizcard-generator/issues/26#issuecomment-1952874820
[ ] python konlpy cli driver that accepts a string and returns the token POS tags
[ ] quizgen accepts source text language opt
[ ] if source text language is Korean
Word
instancesWord
, a new memberroot_string
has the subset ofkey_string
that excludes unimportant parts of speech (particles/ornaments). Another member stores the ornaments.[ ] What to do with word
root_string
and ornaments?I don't think words should use
root_string
as the unique identifier, because a word test should include the particles as part of the test; they are sometimes what makes an answer invalid or valid, depending on the correct overall part of speech.I could use
root_string
to count occurrences of a word (across multiple instances ofWord
with differentkey_string
), which could be used to limit the number of tests of the same word. Likewise, it could be used to enable tests of words otherwise considered to occur too infrequently.I could use
root_string
for edit distance, so that words with different particles and the same root would be stored as edit distance zero. This could then be used to exclude words with the same root from choices for a test (ex.가방은
,가방이
). But again, sometimes testing different parts of speech for the same root is desirable for testing grammar instead of vocabulary.