psolin / cleanco

Company Name Processor written in Python
MIT License
322 stars 95 forks source link

optimization and simplification suggestions #31

Closed petri closed 4 years ago

petri commented 7 years ago

switch to function-based API

switch to working on whitespace-separated name parts rather than full strings

In effect we would check for example in case of suffix for business_name.split()[-1] == term rather than business_name.endswith(' ' + term). Of course the splitting would be done just once in the beginning.

If we can just handle the fact that some legal terms are "multi-part" (whitespace-separated), this would simplify the code and make it run faster since for example we'd only have to work on the last whitespace-separated name part for suffix, and just the first for prefix. There are other cases, too.

We would not have to presort the data, either.

don't use both legal and countrywise suffixes in clean_name

petri commented 4 years ago

Since 2.0, there are now following optimizations:

These are pretty much what this request was asking for, so closing.