We are using your TokenFSM algorithm as part of our system for detecting terms from lexicons. It would be great if it could handle multiple lexicons at the same time, instead of having to create a different object for each one.
There are a couple of considerations:
In case of lexicon overlapping for a single term, we propose two different cases:
1) The longest lexicon entity remains
Lex1 {"New York city"} Lex2{"New York"} -- Sentence "The New York City major... "
Output: {"New York city":Lex1}
2) If the match is exactly the same in both lexicons, then return both:
Lex1 {"US"} Lex2{"US"} -- Term "US intelligence reports say...."
Output: {"US":Lex1, "US":Lex2}
We are using your TokenFSM algorithm as part of our system for detecting terms from lexicons. It would be great if it could handle multiple lexicons at the same time, instead of having to create a different object for each one.
There are a couple of considerations:
In case of lexicon overlapping for a single term, we propose two different cases:
1) The longest lexicon entity remains Lex1 {"New York city"} Lex2{"New York"} -- Sentence "The New York City major... " Output: {"New York city":Lex1}
2) If the match is exactly the same in both lexicons, then return both: Lex1 {"US"} Lex2{"US"} -- Term "US intelligence reports say...." Output: {"US":Lex1, "US":Lex2}
Cheers!