spendright / msd

Merge SpendRight scraper data
Apache License 2.0
0 stars 1 forks source link

names that may be companies or brands #58

Open coyotemarin opened 7 years ago

coyotemarin commented 7 years ago

When a scraper doesn't know whether something is a company or a brand, it should leave the company field blank and put the name in the brand field. If this matches a brand for one company, it's a brand, otherwise it's a company.

This requires a different approach to brands; instead of just looking at the brands for a single company or group of companies, we should look at all brands matching a given key, and then pick them apart into subsidiaries. If we encounter a brand without company that matches one or more companies, we should issue a warning and map it to an empty company and brand.

Also, we'll need to take two passes at company matching; one for companies marked as such, and one for names that turned out not to be brands.

coyotemarin commented 7 years ago

This could be really helpful for, say Labor 411.