Open za3k opened 7 years ago
To piggyback off this sharing post, I made a web app to convert recipes from volumetric to metric units (mainly for the purpose of baking). See gif below for demo usage.
Repo: https://github.com/justinmklam/recipe-converter
Thanks again for creating this great library! It really opens up opportunities to create new projects with this as leverage.
I suppose I should contribute my quick script too!
I'm more of a terminal guy so I wrote a quick python script to convert a recipe into markdown that can be cat
'd.
Re-importing and re-indexing recipe content into https://www.reciperadar.com was a breeze yesterday, largely thanks to recipe-scrapers
, and the quality of the recipe content (although not yet quantified) feels and looks pretty good to me.
I'd like to add a big thanks to @hhursev and @bfcarpio in particular (although to everyone who has contributed to recipe-scrapers
, really) for developing and maintaining the library. Glad to be a part of this community :)
I've created recipe-crawler, which is a configurable web crawler for recipes. It uses recipe-scraper for a couple of websites that don't have data structured in the schema.org/Recipe format.
Please crawl responsibly.
This seems as good a place as any to celebrate that recipe-scrapers
has reached the 1000-stars milestone on GitHub :smile: :champagne: :tada:
Here's hoping for the continuation and development of many useful recipe projects (current and future) thanks to this library.
I've worked on a recipe book app for the last 3 years. Until recently, I had built my own massively over complicated recipe scraper so when I found recipe-scrapers project it was such a great day.
Anyway, the installable web version of the app is nearly ready for 1.0 and folks can start using it at https://app.sharpcooking.net. The project is open source and available at GitHub sharpcooking-web.
Thanks for this great project!
Since we don't have a mailing list for users of the library, I'm going to share this here, because hopefully people with related projects will find it useful:
We now have a developer documentation section that should help to make it easier to develop and maintain scrapers. Many thanks to @strangetom for writing this up!
First off, I love this repo so thanks to @hhursev and all the contributors!
That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.
Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.
First off, I love this repo so thanks to @hhursev and all the contributors!
That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.
Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.
Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.
In any case, this is a cool and useful project!
Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.
In any case, this is a cool and useful project!
@jlucaspains Oh that's interesting! I'm pretty new to anything front end (over here frantically trying to figure out what a sitemap.xml is), so I'll definitely look into it more. Sounds promising and I'm very open to making the current setup a little more robust!
I've put together an ingredient parsing python package ingredient-slicer, which will parse ingredient strings (i.e. "2 1/2 cups of tomato sauce") and do a best effort extraction of the unit
, quantity
, food
, gram_weight
, and other extraneous details (prep
, size_modifiers
, etc.)
I made ingredient-slicer because I needed a lightweight ingredient parser with zero dependencies and that does NOT require/rely on a NLP/models
to do its thing. The package uses only Python's standard library and is pretty quick.
Its by no means perfect for extracting food
perfectly from an ingredient but it does a really good job with unit
and quantity
and applying any extra information mentioned in parenthetical references (i.e. "2 salmon steaks (8 ounces each)" ends up with a unit
of "ounces" and a quantity
of "16" ---> 16 ounces = 2 * 8 ounces each) .
An example to illustrate:
pip install ingredient-slicer
import ingredient_slicer
slicer = ingredient_slicer.IngredientSlicer("2 (15-ounces) cans chickpeas, rinsed and drained")
slicer.to_json()
{
'ingredient': '2 (15-ounces) cans chickpeas, rinsed and drained',
'standardized_ingredient': '2 cans chickpeas, rinsed and drained',
'food': 'chickpeas',
# primary quantity and units
'quantity': '30',
'unit': 'ounces',
'standardized_unit': 'ounce',
# any other secondary quantity and units found in the string
'secondary_quantity': '2',
'secondary_unit': 'cans',
'standardized_secondary_unit': 'can',
'gram_weight': '850.49',
'prep': ['drained', 'rinsed'],
'size_modifiers': [],
'dimensions': [],
'is_required': True,
'parenthesis_content': ['15 ounce']
}
It fixed a problem for me so thought it might be helpful for other people too!
And thank you for everyone that contributes/maintains recipe-scrapers
its a great tool you all have built/maintained, keep up the great work!
Hey, over the past year or so I wanted to dive deeper into Python-development, so I used this project as a basis for my CLI-app recipe2txt.
This was my motivation to examine various aspects of the language and Python-project-management a little closer, so it may be unconventional in some parts, but as far as I know everything works.
Features include asynchronous fetching, jinja-templating and local caching of recipes. And (maybe the most interesting part for recipe-scrapers) it generates formatted Github-issues if any scraping-errors are encountered during the process, so that the user can easily report any errors here.
Thank you to all contributors here that made the hard part of recipe-scraping easy!
Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.
Hey all! I'm working on a tool that maintains a database by scraping all recipe pages from a given website. It pulls the sitemap, selects all pages with recipes and then creates a dict or json file with all metadata scraped by recipe-scraper.
Feel free to check it out at recipe-database-scraper
@mkayeterry I realise there's a bit of overlap with the repo you shared earlier this year. Hope you don't mind. One of my goals was to continue finding new recipe pages added to a website. I couldn't figure out a good way to reconcile that with your repo, so I went in a different direction.
Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.
@timsamart started working on that now, but it'll take a while to go through all websites. Did you already build that db by now?
I thought I'd share what I made with this: https://archive.org/details/recipes-en-201706 A full version of allrecipes, epicurious, cookstr, and bbc.co.uk, parsed into nice JSON with photos.
Sorry to abuse 'issues', there's no option to send a private message on github as far as I know.