Closed shyammk closed 5 years ago
I agree we could do that. Especially the POS tagging one. We then have to capture it as a requirement.
I believe for any kind of text analytics and NLP, we need to do POS stop word removal, tagging, lemmatization / stemming, word frequency generation as part of pre-proc and then moving onto the bag of words, n-grams as the next steps.
Not really. As I understand, till we calculate the readability scores, we need not perform lemmatization/stemming. I feel it might hamper our results while counting the no. of syllables (That needs further analysis though).
Stopwords and punctuations can be removed.
I believe POS is not required unless there is a need to count the number of nouns/verbs in any readability score calculation method. But yes, some of our additional features might require this action.
We may need both Lemetization and POS tagging in the future stages, but for now we don't in terms of our MVP. Hence, lets concentrate on the MVP alone and design the structure keeping in mind that we will need to include these later. Closing this issue as of now.
I feel for some new features, we might have to perform lemmatization and POS-tagging.
In that case, it would be better to include those operations too as a part of our readability score calculation module.
Let's discuss!