hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.6k stars 505 forks source link

add jocooks.com scraper #1134

Closed Mooree003 closed 4 weeks ago

Mooree003 commented 1 month ago

This is a PR for recipe-scrapers to include recipes from jocooks. There was full schema support except for ingredient groups which had to be configured manually.

Resolves #1129

jayaddison commented 1 month ago

This generally looks good to me, thank you @Mooree003!

Out of interest: for the ingredient_groups implementation: we have some scrapers (rainbowplantlife.py, vanillaandbean.py for example) that query similar-looking HTML elements using a group_ingredients utility function that can make the code more concise: do you know whether that approach would be re-usable here?

Mooree003 commented 1 month ago

This generally looks good to me, thank you @Mooree003!

Out of interest: for the ingredient_groups implementation: we have some scrapers (rainbowplantlife.py, vanillaandbean.py for example) that query similar-looking HTML elements using a group_ingredients utility function that can make the code more concise: do you know whether that approach would be re-usable here?

The only reason was in another scraper i created this method seemed to not work as intended so I reused my previous method however the group_ingredients works for this scraper so I will add this now

jayaddison commented 1 month ago

Thanks @Mooree003!

From some testing here: the cooking_method and equipment methods are not available on the SchemaOrg class, so we can't use those by calling self.schema.<method-name>(). I would recommend either removing them, or retrieving the data from the HTML where possible.

And a question / optional feature request: it looks like the recipe webpage includes nutritional info in the schema.org metadata, so that would be a nice bonus if we can include that too.

Mooree003 commented 1 month ago

Thanks @Mooree003!

From some testing here: the cooking_method and equipment methods are not available on the SchemaOrg class, so we can't use those by calling self.schema.<method-name>(). I would recommend either removing them, or retrieving the data from the HTML where possible.

And a question / optional feature request: it looks like the recipe webpage includes nutritional info in the schema.org metadata, so that would be a nice bonus if we can include that too.

Interesting I thought I removed these 😂. I'll add the nutritional info and remove the functions

jayaddison commented 1 month ago

:+1: all looks good to me...

...but I'm going to add one more suggestion, because even though equipment isn't available through SchemaOrg, there is some equipment info in the HTML, and it's not too tricky to extract. Hold on for a moment and I'll add a couple of code review comments to do that.

Mooree003 commented 1 month ago

👍 all looks good to me...

...but I'm going to add one more suggestion, because even though equipment isn't available through SchemaOrg, there is some equipment info in the HTML, and it's not too tricky to extract. Hold on for a moment and I'll add a couple of code review comments to do that.

No worries! Added now

Mooree003 commented 1 month ago

changes addressed

jknndy commented 4 weeks ago

@Mooree003 , looks great thanks! merging now