q-m / food-ingredient-parser-ruby

Extract the structure of ingredient lists on food products
MIT License
16 stars 2 forks source link

Handle multiple marks #5

Open wvengen opened 5 years ago

wvengen commented 5 years ago

A small number of products has multiple mark symbols. Currently only one is supported.

An example is Dr. Bronner's Shikakai soap teatree with ingredients:

INCI: Vitis Vinifera Juice*, Sucrose*^, Cocos Nucifera Oil* (***), Potassium Hydroxide, Olea Europaea* (***), Melaleuca Alternifolia*, Accacia Concinna (Shikakai) Nut Powder*, Citric Acid, Cannabis Sativa Seed Oil*, Buxus Chinensis (Jojoba) Seed Oil*, Tocopherols (vitaminw E), d-Limonene***. * Van biologische herkomst *** Etherische olie ^ Volgens fairtrade-normen verhandeld

and

Water, groentenª¹ 26,3% (broccoli 17,2%, erwt 2,5%, prei, ui 2,6%, spinazie), aardappelª¹, ROOMª 5,9%, maïszetmeelª, raapzaadolieª, zout, rietsuikerª, gistextract, nootmuskaatª, aroma, ª afkomstig van gecontroleerde biologische landbouw., ¹op duurzame wijze geteeld.

wvengen commented 2 months ago

A repitition of a single character in a mark would still be one mark, when there are different marks after each other, they are different. Then there is the bracketed notation for multiple marks (let's limit this to two note marks, as we'd need examples, e.g. would it be * (***) (****) or * (***, ****) or (*, ***, ****)?)