hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 505 forks source link

Add support for moulinex.fr #1066

Closed SANTASLAYERS closed 2 months ago

SANTASLAYERS commented 2 months ago

This PR adds support for moulinex.fr with https://www.moulinex.fr/recette/detail/PRO/Blanquette%20de%20veau/250881 and https://www.moulinex.fr/recette/detail/PRO/B%C5%93uf%20bourguignon/%2Fv1%2Frecipes%2F108166%2F as test cases.

Resolves https://github.com/hhursev/recipe-scrapers/issues/1006

Let me know if I missed anything!

jknndy commented 2 months ago

Hi @SANTASLAYERS , thanks for the contribution!

Is there any specific reason you decided to use html scraping rather then the schema returned data? It appears that the schema provides all the information, except instructions where I can see the improvement.

SANTASLAYERS commented 2 months ago

Hey jknndy, Mb just complicating things. I'll update it shortly :)

jknndy commented 2 months ago

@jayaddison, any opposition to the re implementation here? Provides a nice cleanup to the output imo.

without
      "ingredients": [
        "400g Bœuf coupé en cubes de 25 g",
        "10cl Vin rouge",
        "10cl Fond de veau",
        "50g Lardons",
        "100g Oignons émincés",
with
        "400 g Bœuf coupé en cubes de 25 g",
        "10 cl Vin rouge",
        "10 cl Fond de veau",
        "50 g Lardons",
        "100 g Oignons émincés",
jayaddison commented 2 months ago

@jayaddison, any opposition to the re implementation here? Provides a nice cleanup to the output imo.

That seems fine to me - thank you @SANTASLAYERS @jknndy. Are there any recipes that include fractional or decimal quantities? Those could cause problems for the regex as-is.

SANTASLAYERS commented 2 months ago

Of course! And yes there are many recipes with decimal values which should work with the current regex implementation https://www.moulinex.fr/recette/detail/PRO/Poulet%20basquaise/%2Fv1%2Frecipes%2F123908%2F

From what I can tell, there is not supposed to be fractional quantities. I did find one rogue recipe with a fractional quantity but moulinex.fr only takes the first decimal value and pushes the rest as part of the ingredient name. See: https://www.moulinex.fr/recette/detail/PRO/Cr%C3%A8mes%20aux%20%C5%93ufs%20%C3%A0%20la%20vanille/732276

I don't mind adding functionality to account for this type of edge case mistake on moulinex but that seems more like their issue. What do you guys think ?

jayaddison commented 2 months ago

Of course! And yes there are many recipes with decimal values which should work with the current regex implementation https://www.moulinex.fr/recette/detail/PRO/Poulet%20basquaise/%2Fv1%2Frecipes%2F123908%2F

Ok, thank you! I'd misunderstood how the regex works: it is not matching the entire value and unit -- it matches any numeric digit followed immediately by a letter; wherever that pattern is found, a space is inserted between the two characters.

Two edge conditions -- maybe not important, but possibilities:

From what I can tell, there is not supposed to be fractional quantities. I did find one rogue recipe with a fractional quantity but moulinex.fr only takes the first decimal value and pushes the rest as part of the ingredient name. See: https://www.moulinex.fr/recette/detail/PRO/Cr%C3%A8mes%20aux%20%C5%93ufs%20%C3%A0%20la%20vanille/732276

I don't mind adding functionality to account for this type of edge case mistake on moulinex but that seems more like their issue. What do you guys think ?

Ok. It seems infrequent enough to be permissible - if it causes significant problems in future, or if someone cares enough about it specifically, then that could be handled later.

SANTASLAYERS commented 2 months ago

Good point! I just updated the Regex. Now it checks for a number not followed by a space, a '.' or another number. If so it inserts a space.

SANTASLAYERS commented 2 months ago

Nice, thanks for the help ! To me it looks good to merge too.

jayaddison commented 2 months ago

Thank you, @SANTASLAYERS! This functionality should be included in an upcoming release of the library within the next week or two.

jayaddison commented 2 months ago

@SANTASLAYERS this has been included and released in v14.56.0 (and also the pre-release v15.0.0-rc3) of recipe-scrapers on PyPi.

SANTASLAYERS commented 2 months ago

Awesome ! Thanks for helping me @jayaddison @jknndy