microsoft / CognitiveServicesLanguageUtilities

Utilities for the Cognitive Service Custom Text document processing tool.
MIT License
18 stars 1 forks source link

updated fuzzy matching pipeline - approach 1 #106

Closed mshaban-msft closed 3 years ago

mshaban-msft commented 3 years ago

steps:

  1. tokenize input sentence
    • different levels of tokenization
    • 1-word tokens, 2-word tokens, ..
    • ex: "i want to travel from cairo to new york"
    • 1-word tokens ["i", "want", "to", ..., "new", "york"]
    • 2-word tokens ["i want", "to travel", ..., "new york"]
  2. match against pre-processed dataset

this way we'll be able to get start and ending indices for matched results