psolin / cleanco

Company Name Processor written in Python
MIT License
322 stars 95 forks source link

Incorrect detection of "Pty Limited" Suffix #41

Open Sir-Onion opened 4 years ago

Sir-Onion commented 4 years ago
>>> cleanco("Example Example Pty Ltd").clean_name() # CORRECT
'Example Example'
>>> cleanco("Example Example Pty Limited").clean_name() # Not so good
'Example Example Pty'

The give you a view on the scope of the problem: I'm working to normalise a database of around on processing a database of around 900k company names which have been typed into an application over a 10 year period. The database contains primarily companies from anglophone countries. Of these, around 580 have a company name like this.

Do you see this as a problem also? If so, I'm happy to put together a patch.

petri commented 4 years ago

Thank you. I did a quick google on the topic and this seems valid. Please, a github PR is welcomed if you can submit one.

petri commented 4 years ago

@tubasal is "pty ltd" (or "pty limited") its own legal form or is this suffix just a concatenation of two different suffixes? You can get rid of multiple suffixes by running the removal twice.

petri commented 4 years ago

I took a look at the term definitions. We don't have pty as a separate term, nor do we have pty limited. So this cannot work. Presuming the work on using ISO standard 20275 bears fruit, this issue might become fixed by improved term definitions that the standard provides. On the other hand, it's possible that the term definitions there might fall short the same way as here.