Extract Keywords from sentence or Replace keywords in sentences.
MIT License
data-extraction keyword-extraction nlp search-in-text word2vec

========= FlashText

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm <>_.



$ pip install flashtext

API doc

Documentation can be found at FlashText Read the Docs <>_.


Extract keywords

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor()

keyword_processor.add_keyword(, )

keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.') keywords_found

['New York', 'Bay Area']

Replace keywords

keyword_processor.add_keyword('New Delhi', 'NCR region') new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.') new_sentence

'I love New York and NCR region.'

Case Sensitive example

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor(case_sensitive=True) keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') keywords_found

['Bay Area']

Span of keywords extracted

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.', span_info=True) keywords_found

[('New York', 7, 16), ('Bay Area', 21, 29)]

Get Extra information with keywords extracted

from flashtext import KeywordProcessor kp = KeywordProcessor() kp.add_keyword('Taj Mahal', ('Monument', 'Taj Mahal')) kp.add_keyword('Delhi', ('Location', 'Delhi')) kp.extract_keywords('Taj Mahal is in Delhi.')

[('Monument', 'Taj Mahal'), ('Location', 'Delhi')]

NOTE: replace_keywords feature won't work with this.

No clean name for Keywords

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.add_keyword('Big Apple') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') keywords_found

['Big Apple', 'Bay Area']

Add Multiple Keywords simultaneously

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_dict = { "java": ["java_2e", "java programing"], "product management": ["PM", "product manager"] }

{'clean_name': ['list of unclean names']}


Or add keywords from a list:

keyword_processor.add_keywords_from_list(["java", "python"]) keyword_processor.extract_keywords('I am a product manager for a java_2e platform')

output ['product management', 'java']

To Remove keywords

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_dict = { "java": ["java_2e", "java programing"], "product management": ["PM", "product manager"] } keyword_processor.add_keywords_from_dict(keyword_dict) print(keyword_processor.extract_keywords('I am a product manager for a java_2e platform'))

output ['product management', 'java']


you can also remove keywords from a list/ dictionary

keyword_processor.remove_keywords_from_dict({"product management": ["PM"]}) keyword_processor.remove_keywords_from_list(["java programing"]) keyword_processor.extract_keywords('I am a product manager for a java_2e platform')

output ['product management']

To check Number of terms in KeywordProcessor

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_dict = { "java": ["java_2e", "java programing"], "product management": ["PM", "product manager"] } keyword_processor.add_keywords_from_dict(keyword_dict) print(len(keyword_processor))

output 4

To check if term is present in KeywordProcessor

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.add_keyword('j2ee', 'Java') 'j2ee' in keyword_processor

output: True


output: Java

keyword_processor['colour'] = 'color' keyword_processor['colour']

output: color

Get all keywords in dictionary

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.add_keyword('j2ee', 'Java') keyword_processor.add_keyword('colour', 'color') keyword_processor.get_all_keywords()

output: {'colour': 'color', 'j2ee': 'Java'}

For detecting Word Boundary currently any character other than this \\w [A-Za-z0-9_] is considered a word boundary.

To set or add characters as part of word characters

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.add_keyword('Big Apple') print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.'))

['Big Apple']

keyword_processor.add_non_word_boundary('/') print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.'))




$ git clone
$ cd flashtext
$ pip install pytest
$ python test

Build Docs


$ git clone
$ cd flashtext/docs
$ pip install sphinx
$ make html
$ # open _build/html/index.html in browser to view it locally

Why not Regex?

It's a custom algorithm based on Aho-Corasick algorithm <> and Trie Dictionary < Dictionary>.

Time taken by FlashText to find terms in comparison to Regex.

Time taken by FlashText to replace terms in comparison to Regex.

Link to code for benchmarking the Find Feature <> and Replace Feature <>.

The idea for this library came from the following StackOverflow question <>_.


The original paper published on FlashText algorithm <>_.


The article published on Medium freeCodeCamp <>_.



The project is licensed under the MIT license.