Mealie is a self hosted recipe manager and meal planner with a RestAPI backend and a reactive frontend application built in Vue for a pleasant user experience for the whole family. Easily add recipes into your database by providing the url and mealie will automatically import the relevant data or add a family recipe with the UI editor
This PR adds a secondary recipe scraper strategy after it fails to scrape a site normally (using the recipe scrapers package). It actually still uses the recipe scrapers package, but replaces the JSON extraction with OpenAI (that way all of our cleaning and validating logic is reused). OpenAI takes the raw* HTML, then formats it into the recipe JSON-LD, which is then fed into recipe scrapers.
*we actually do some processing upfront to reduce the amount of data being sent to OpenAI, because if we just send the entire HTML body we quickly go over sane token counts
I also added an option to the in-app debugger to force it to try parsing with OpenAI (if OpenAI is enabled). I also did some refactoring to make extending the prompts/models easier. Even though I ended out not really using it in this PR, I left the changes because I think it's better organized.
I tried a bunch of recipes from closed issues where scraping failed due to the site not being supported by recipe-scrapers. In every instance where there was actually scrapable data it worked great. It only fails when we can't actually get the site contents at all (obviously).
What type of PR is this?
(REQUIRED)
What this PR does / why we need it:
(REQUIRED)
This PR adds a secondary recipe scraper strategy after it fails to scrape a site normally (using the recipe scrapers package). It actually still uses the recipe scrapers package, but replaces the JSON extraction with OpenAI (that way all of our cleaning and validating logic is reused). OpenAI takes the raw* HTML, then formats it into the recipe JSON-LD, which is then fed into recipe scrapers.
*we actually do some processing upfront to reduce the amount of data being sent to OpenAI, because if we just send the entire HTML body we quickly go over sane token counts
I also added an option to the in-app debugger to force it to try parsing with OpenAI (if OpenAI is enabled). I also did some refactoring to make extending the prompts/models easier. Even though I ended out not really using it in this PR, I left the changes because I think it's better organized.
Which issue(s) this PR fixes:
(REQUIRED)
Discussed here: https://github.com/mealie-recipes/mealie/discussions/3200
Testing
(fill-in or delete this section)
I tried a bunch of recipes from closed issues where scraping failed due to the site not being supported by recipe-scrapers. In every instance where there was actually scrapable data it worked great. It only fails when we can't actually get the site contents at all (obviously).