Closed AnizD closed 2 years ago
Some points to note here:
- Bonus words
These are the words pointing towards the important sentences. These may include superlatives ,adverbs etc.
- Stigma words
These are the words that have negative addect on the sentence importance. It includes anaphoric expressions, belittling expressions, etc.(We may expect the machine to treat them important but they are not really.)
- Null words
These aare the neutral or irrelevant words to the importance of sentences. These words are much like stopwords.
From the code, I don't think was
, this
, etc contribute to Stigma words unless there is some evidence of negative weights.
Do check the default dictionary corpus for null and stigma words for Edmundson/sumy
Have updated the Bonus and Stigma words. This is done based on the top words/bigrams/trigrams in the corpus (Review Text, Pros, Cons).
Link to the latest version of the code: https://github.com/sayantikabanik/capstone_isb/blob/main/experiments/Text%20Summarization%20-%20Sumy%20Package%20-%20Infosys_V2.ipynb
LGTM thanks @AnizD
Hi ! Want you all to have a look at the quality of the output 👇🏼. The data fed in: All Review Text for 1) Infosys, 2) Infosys Pune and 3) Infosys Pune Technology Analysts. For the Feedback box, Edmundson looks better than the others. Also, do look at the bonus words, stigma words Let me know your thoughts in terms of the chosen technique.
Please refer to the python code under Experiments: Text Summarization - Sumy Package - Infosys.ipynb