ogallagher / quizcard-generator

Given a source document, generate quiz/flash cards
https://wordsearch.dreamhosters.com/quizcard-generator
MIT License
1 stars 0 forks source link

Use Korean NLP library for filtering testable words by part of speech #42

Open ogallagher opened 9 months ago

ogallagher commented 9 months ago

I plan to use the konlpy Python package, with a driver script that quizgen can call to fetch part of speech (POS) tags for a given sentence. https://github.com/ogallagher/quizcard-generator/issues/26#issuecomment-1952874820

I don't think words should use root_string as the unique identifier, because a word test should include the particles as part of the test; they are sometimes what makes an answer invalid or valid, depending on the correct overall part of speech.

I could use root_string to count occurrences of a word (across multiple instances of Word with different key_string), which could be used to limit the number of tests of the same word. Likewise, it could be used to enable tests of words otherwise considered to occur too infrequently.

I could use root_string for edit distance, so that words with different particles and the same root would be stored as edit distance zero. This could then be used to exclude words with the same root from choices for a test (ex. 가방은, 가방이). But again, sometimes testing different parts of speech for the same root is desirable for testing grammar instead of vocabulary.