hmontazeri / is-vegan

Is-Vegan helps you to find out which food ingredients are vegan / non-vegan
MIT License
478 stars 35 forks source link

Demo considers 'MECHANICALLY SEPARATED CHICKEN' as vegan #9

Open fluxsauce opened 6 years ago

fluxsauce commented 6 years ago

https://github.com/hmontazeri/is-vegan/blame/25e87fef5b88f92319f001c6af96ff658fddcbf2/README.md#L152

isVegan.containsNonVeganIngredients([
...
  'MECHANICALLY SEPARATED CHICKEN',
...
]); // returns ['PASTEURIZED MILK', 'PORK', 'BEEF', 'WHEY']

Consider using fuzzy matching with a degree of confidence instead of string matching.

hmontazeri commented 6 years ago

Good point! will be added. Thx!

hmontazeri commented 6 years ago

Fixed!

fluxsauce commented 6 years ago

I disagree, the fundamental problem is still there. Yes, mechanically separated chicken is technically on the list, but what about separated chicken? chicken parts?

At the very least, search the entire string, don't just match the exact string.

hmontazeri commented 6 years ago

I understand what you mean, a wildcard search is not the answer as well... It could match parts which could make it worse than matching an exact string...

hmontazeri commented 6 years ago

@fluxsauce how about adding a regex search for obvious meat / fish species?

fluxsauce commented 6 years ago

That would work; I'd call it a component match and fill it with terms that shouldn't false positive, such as:

pig, pork, lard, beef, ribs, fillet, poultry, chicken, turkey, eggs, sheep, mutton, lamb, goat, rabbit, caviar, roe, honey, venison, steak

it'd be a shorter list than whole ingredients.

drusepth commented 6 years ago

I'd be wary of searching for particular substrings like "chicken", because you can easily also end up needing to add prefixes and such to avoid false negatives on things like "vegan chicken" or "chicken alternative", or "chicken tofu", etc.

Seems like there's a different (better) solution out there, but I'm not sure what it is. The above solution(s) work somewhat if you prefer false negatives over false positives, though.

fluxsauce commented 6 years ago

The above solution(s) work somewhat if you prefer false negatives over false positives, though.

It could be mitigated with a blacklist of known false positives like "chicken alternative".

There's a reason why I personally avoid signature-based scanners, it's a constant "two steps forward and one step back" of exceptions.

Try valid US street address parsing as an example, seems simple until it isn't :-)