While searching for a search utility, I found FlashText .
However, instead of returning only the keyword matches and the index span info, this PR focuses on returning the sentences which contain the keyword.
The approach followed is to sentence tokenize the corpus using NLTK's sent_tokenize.
This is configurable with fetch_sent flag in extract_keywords(<corpus>, <span_info_flag>, <fetch_sent>) which is False by default.
Also, keyword.py is made entirely PEP8 compliant.
While searching for a
search
utility, I found FlashText . However, instead of returning only the keyword matches and the index span info, this PR focuses on returning the sentences which contain the keyword. The approach followed is to sentence tokenize the corpus using NLTK'ssent_tokenize
. This is configurable withfetch_sent
flag inextract_keywords(<corpus>, <span_info_flag>, <fetch_sent>)
which isFalse
by default. Also,keyword.py
is made entirely PEP8 compliant.P.S: Please run
pip install nltk