woodbri / address-standardizer

An address parser and standardizer in C++
Other
7 stars 1 forks source link

Handle multiple identical adjacent tokens #9

Closed woodbri closed 8 years ago

woodbri commented 8 years ago

Need to handle multiple adjacent WORD tokens

woodbri commented 8 years ago

Decided that this is not an issue because it can be handled to some extent in the grammar via:

[section]
WORD -> <OutClass> -> <score>
WORD WORD -> <OutClass> <OutClass> -> <score>
WORD WORD WORD -> <OutClass> <OutClass> <OutClass> -> <score>
WORD WORD WORD WORD -> <OutClass> <OutClass> <OutClass> <OutClass> -> <score>

The search algorithm should be robust enough to handle this. I would decrease the score slightly for each additional WORD token.