q-m / food-ingredient-parser-ruby

Extract the structure of ingredient lists on food products
MIT License
16 stars 2 forks source link

Handle chemical names better #19

Closed wvengen closed 3 years ago

wvengen commented 3 years ago

Some chemical names have numbers in them, these are not recognized or by the parser or wrongly parsed.

wvengen commented 3 years ago

getting somewhere, but now natuurlijke aroma's isn't recognized anymore ...

rule chemical_systematic_name
  ( chemical_systematic_name_num dash ) ( [A-Za-z]+ dash ( chemical_systematic_name_num dash ws? 
  ( [A-Za-z]+ dash ( chemical_systematic_name_num dash ws? )? )+ [A-Za-z]+
end

rule chemical_systematic_name_num
  digit+ [RH'] /
  digit+ ( ',' digit+ )*
end
wvengen commented 3 years ago

All are parsed now, except 1,2-benzisothiazol-3 (because it ends with -3, which I wouldn't expect). This is probably parse error earlier. Chemical names don't really end with a number like that, as far as I am aware.