psolin / cleanco

Company Name Processor written in Python
MIT License
322 stars 95 forks source link

add more test data (company names) #17

Open petri opened 9 years ago

petri commented 9 years ago

@psolin , would you have any lists of company names that you want to see tested?

davidheryanto commented 8 years ago

Hi I've some compay name such as:

Do you think it's a good idea to add these additional terms on termdata.py ?

petri commented 8 years ago

https://opencorporates.com could be used for testing?

petri commented 7 years ago

@davidheryanto it depends. What countries are those for?

petri commented 7 years ago

I have added a companies.csv file to the tests directory, but unfortunately it seems we cannot really use bulk ascii company names for testing, since many international companies use common anglo-american suffixes such as ltd. or inc. in their corporate names. Which results in a lot of failures.

If we could get the unicode versions of the national suffixes, now that would be useful (ie. in native Chinese or Russian characters). But I am not sure whether cleanco even supports that.

davidheryanto commented 7 years ago

Yes, agree with the Unicode approach. It will be applicable to company names in different countries.

The company names I gave are examples of companies in Singapore.

petri commented 4 years ago

We now have improved Unicode & non - Latin script support. So better test coverage would make sense too.

One option would be to use https://faker.readthedocs.io/en/master/ to generate fake test company names. Manual labour would still be needed to provide the expected base names that cleanco should be able to produce.