psolin / cleanco

Company Name Processor written in Python
MIT License
322 stars 95 forks source link

Clean_name to remove all items after a comma #3

Closed ccdpowell closed 9 years ago

ccdpowell commented 9 years ago

I like the idea of this and think there is a lot of use to it. I think it would be more useful if it removed all of company name string after(and including) a ','. I'd add this into the clean_name function similar to how you do with hyphens.

psolin commented 9 years ago

Thanks for the suggestion. I like the idea, and will implement it within the next week.

psolin commented 9 years ago

Actually, let's talk about this for a second. Let's say that you have a company name like "My Big Company, LLC" I can easily remove that comma, but removing everything after it would get rid of the business entity designation before the algorithm had a chance to look at it. Removing the just the comma would allow me to process/remove LLC, and then strip() could remove the trailing white space.

Also, this would not work for a law firm name like "Dewey, Cheatem, & Howe" -- it would turn this into "Dewey". Under the current comma removal process, it would look like "Dewey Cheatem & Howe", which doesn't make it look that much different. The hyphen removal was in place because there were company names like "Comcast - A PaulCo Company, Inc." in the database that I was looking at.

Maybe the solution here is just to remove trailing commas and hyphens only?
Example: "My Big Company, LLC"

"My Big Company, " --Remove "LLC" "My Big Company," --strip() removes trailing whitespace "My Big Company" --Regex inspects last character to be [a-z], removes it if not

petri commented 9 years ago

I agree. I don't see in what kind of a case should EVERYTHING after a comma be removed, in the first place?

psolin commented 9 years ago

Also, I won't remove things in parenthesis. That will be up to the person running the code to figure out if they need to do that with their business names.