psolin / cleanco

Company Name Processor written in Python
MIT License
318 stars 94 forks source link

Base name is not working with some names #77

Open sandeepnatoo opened 2 years ago

sandeepnatoo commented 2 years ago

I checked some of the scenarios where basename function giving empty result. from cleanco import basename print("Base name name for {} : {}".format('IKS APS', basename("IKS APS"))) print("Base name name for {} : {}".format('S.C.S & COMPANY', basename("S.C.S & COMPANY"))) print("Base name name for {} : {}".format('COOP', basename("COOP")))

petri commented 2 years ago

Yes, the point of basename is removing common suffixes, prefixes etc. to leave just the base name. You're basically giving those suffixes/prefixes there, or combinations of them. What is the problem you're having with this? Are those actual company names that you try to normalize?

FBnil commented 1 year ago

Coop is a Dutch supermarket (full name: 'Coop Supermarkten BV', but the full name actually works fine). And indeed, the basename of Coop is "" (empty string). Same for SCS, it's a key in "Limited" (dict terms_by_type). Where the full name 'SCS Software s.r.o.' also works just fine.

I think the code, in the last iteration removing things, if it finds that it has to remove everything, there must be a way to recover the iteration before that. (but maybe not by default, because it's actually handy to remove multiple terms). Of course, this check can be done at the userside too, and should at least be mentioned in the readme/documentation.

sandeepnatoo commented 1 year ago

Coop is a Dutch supermarket (full name: 'Coop Supermarkten BV', but the full name actually works fine). And indeed, the basename of Coop is "" (empty string). Same for SCS, it's a key in "Limited" (dict terms_by_type). Where the full name 'SCS Software s.r.o.' also works just fine.

I think the code, in the last iteration removing things, if it finds that it has to remove everything, there must be a way to recover the iteration before that. (but maybe not by default, because it's actually handy to remove multiple terms). Of course, this check can be done at the userside too, and should at least be mentioned in the readme/documentation.

Yes, agree with you

sandeepnatoo commented 1 year ago

Yes, the point of basename is removing common suffixes, prefixes etc. to leave just the base name. You're basically giving those suffixes/prefixes there, or combinations of them. What is the problem you're having with this? Are those actual company names that you try to normalize?

Yes, these are the some of the organization names I came across.