vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.59k stars 599 forks source link

Feature Request: Can we also get span of matches found? #17

Closed scarescrow closed 6 years ago

scarescrow commented 6 years ago

Most regex libraries also give the location of the matches found. Can this information also be provided by FlashText?

For example:

>>> from flashtext import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> # keyword_processor.add_keyword(<unclean name>, <standardised name>)
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
# Maybe something like
>>> keywords_found = keyword_processor.extract_keywords('I love New York and Bay Area.', spanInfo=True)
>>> keywords_found
>>> # {'New York': (8,15), 'Bay Area': (21,28)}
vi3k6i5 commented 6 years ago

Duplicate of https://github.com/vi3k6i5/flashtext/issues/14

Merging and closing, will work on it soon.

vi3k6i5 commented 6 years ago

@scarescrow I will work on it soon. Thanks for validating the requirement of this feature.

vi3k6i5 commented 6 years ago

@scarescrow Not to bother you, but added the feature in Version 2.5

keyword_processor.extract_keywords('I love big Apple and Bay Area.', span_info=True)
# [('New York', 7, 16), ('Bay Area', 21, 29)]
scarescrow commented 6 years ago

@vi3k6i5 Thanks!

scarescrow commented 6 years ago

@vi3k6i5 You may also need to consider the case of duplicates in the future. Suppose "New York" appears twice in the string, then should the span of any one be shown, or should an array of spans be shown, etc.

vi3k6i5 commented 6 years ago

Hi, it's already an array of span so that is taken care of. I didn't go with a dictionary for exactly this reason. Also order gets preserved.

Thanks for pointing it out though.