Open wvengen opened 6 years ago
This is not really an issue for the loose parser, which handles each separator as equal. But if the need arises, it could be implemented there as well.
This often has a meaning, in e.g.
glucosesiroop, suiker, water, gemodificeerd zetmeel, gelatine (rund), vitamine A, vitamine C, vitamine D3, vitamine E, vitamine B6, foliumzuur, vitamine B12, biotine, pantotheenzuur, kaliumjodide, zinkcitraat, magnesiumoxide, zuurteregelaar: citroenzuur; kleurstoffen: curcumine, anthocyanen (vlierbes); natuurlijke aroma’s: sinaasappel, kers, citroen; glansmiddel: carnaubawas; plantaardige olie: kokosnootolie (Cocos nucifera L.); emulgatoren: mono- en diglyceriden van vetzuren, citroenzuuresters van mono- en diglyceriden van vetzuren; maltodextrine.
Here the semicolon ends a list after a colon.
Another example, where it also ends the list after a colon.
Water; plantaardige oliën (zonnebloem 15,2%, raapzaad 6%, lijnzaad 4,8%, palm, palmpit, geheel geharde palmpit, geheel geharde palm); mineraal: calciumzouten van orthofosforzuur; gemodificeerd maïszetmeel; palmstearine; emulgatoren: E471 (niet dierlijk) en zonnebloemlecithine; zout 0,2%; conserveermiddel: E202; voedingszuur: citroenzuur; antioxidant: E385; aroma; vitaminen: A, thiamine (B1), riboflavine (B2), B6, foliumzuur (B11), B12 en D2; kleurstof: carotenen
Ok, I have something that seems to work ...
rule list
# ...
contains:( ( (ingredient ws* ',' ws*)* ingredient_coloned )+ ( ws* ingredient (ws* ',' ws* ingredient)* ) ) <ListNode>
# ...
end
rule ingredient_coloned_inner_list
# ...
contains:( ingredient_coloned_simple_with_amount_and_nest ( ws* ',' ws* ingredient_coloned_simple_with_amount_and_nest )* ';' ) <ListNode>
end
This seems to tackle it! An ingredient listing like
Ingrediënten: mineraalwater, suiker, citroensap uit concentraat, aardbeiensap uit concentraat, smaakversterker: erythritol, natuurlijk aroma, zoetstof: steviolglycosiden; vitaminen: Vitamine B6, Vitamine B12.
used to put everything after ;
in the notes, but it is properly parsed with this change!
update actually, this is a somewhat malformed line: some coloned ingredients end with a comma, others with a semicolon. In this instance, one can understand that smaakversterker: erythritol
is one nested ingredient, and natuurlijk aroma
the next.
Still having trouble to parse an ingredient list with a nesting IngredientColoned
ending with a non-nested ingredient.
Commit a4ca35cc9bf28ebd72162358f25046736512d3f4 handles most cases. Pending:
An ingredients list like
"Schokolade (Süßungsmittel: Maltit; Kakaobutter, Kakaomasse)"
contains mixed separators (;
and,
). Hiere the semicolon is used to indicate the end of the second-level nesting forMaltit
.