Closed SANTASLAYERS closed 2 months ago
Hi @SANTASLAYERS , thanks for the contribution!
Is there any specific reason you decided to use html scraping rather then the schema returned data? It appears that the schema provides all the information, except instructions where I can see the improvement.
Hey jknndy, Mb just complicating things. I'll update it shortly :)
@jayaddison, any opposition to the re
implementation here? Provides a nice cleanup to the output imo.
without
"ingredients": [
"400g Bœuf coupé en cubes de 25 g",
"10cl Vin rouge",
"10cl Fond de veau",
"50g Lardons",
"100g Oignons émincés",
with
"400 g Bœuf coupé en cubes de 25 g",
"10 cl Vin rouge",
"10 cl Fond de veau",
"50 g Lardons",
"100 g Oignons émincés",
@jayaddison, any opposition to the
re
implementation here? Provides a nice cleanup to the output imo.
That seems fine to me - thank you @SANTASLAYERS @jknndy. Are there any recipes that include fractional or decimal quantities? Those could cause problems for the regex as-is.
Of course! And yes there are many recipes with decimal values which should work with the current regex implementation https://www.moulinex.fr/recette/detail/PRO/Poulet%20basquaise/%2Fv1%2Frecipes%2F123908%2F
From what I can tell, there is not supposed to be fractional quantities. I did find one rogue recipe with a fractional quantity but moulinex.fr only takes the first decimal value and pushes the rest as part of the ingredient name. See: https://www.moulinex.fr/recette/detail/PRO/Cr%C3%A8mes%20aux%20%C5%93ufs%20%C3%A0%20la%20vanille/732276
I don't mind adding functionality to account for this type of edge case mistake on moulinex but that seems more like their issue. What do you guys think ?
Of course! And yes there are many recipes with decimal values which should work with the current regex implementation https://www.moulinex.fr/recette/detail/PRO/Poulet%20basquaise/%2Fv1%2Frecipes%2F123908%2F
Ok, thank you! I'd misunderstood how the regex works: it is not matching the entire value and unit -- it matches any numeric digit followed immediately by a letter; wherever that pattern is found, a space is inserted between the two characters.
Two edge conditions -- maybe not important, but possibilities:
[a-zA-Z]
does not appear to cover accented characters. Could there be quantities that begin with that? (scientific units: perhaps not, by design. but maybe the site uses other names? I saw a .5càs
somewhere, which would be fine)From what I can tell, there is not supposed to be fractional quantities. I did find one rogue recipe with a fractional quantity but moulinex.fr only takes the first decimal value and pushes the rest as part of the ingredient name. See: https://www.moulinex.fr/recette/detail/PRO/Cr%C3%A8mes%20aux%20%C5%93ufs%20%C3%A0%20la%20vanille/732276
I don't mind adding functionality to account for this type of edge case mistake on moulinex but that seems more like their issue. What do you guys think ?
Ok. It seems infrequent enough to be permissible - if it causes significant problems in future, or if someone cares enough about it specifically, then that could be handled later.
Good point! I just updated the Regex. Now it checks for a number not followed by a space, a '.' or another number. If so it inserts a space.
Nice, thanks for the help ! To me it looks good to merge too.
Thank you, @SANTASLAYERS! This functionality should be included in an upcoming release of the library within the next week or two.
@SANTASLAYERS this has been included and released in v14.56.0
(and also the pre-release v15.0.0-rc3
) of recipe-scrapers
on PyPi.
Awesome ! Thanks for helping me @jayaddison @jknndy
This PR adds support for moulinex.fr with https://www.moulinex.fr/recette/detail/PRO/Blanquette%20de%20veau/250881 and https://www.moulinex.fr/recette/detail/PRO/B%C5%93uf%20bourguignon/%2Fv1%2Frecipes%2F108166%2F as test cases.
Resolves https://github.com/hhursev/recipe-scrapers/issues/1006
Let me know if I missed anything!