mealie-recipes / mealie

Mealie is a self hosted recipe manager and meal planner with a RestAPI backend and a reactive frontend application built in Vue for a pleasant user experience for the whole family. Easily add recipes into your database by providing the url and mealie will automatically import the relevant data or add a family recipe with the UI editor
https://docs.mealie.io
GNU Affero General Public License v3.0
5.62k stars 613 forks source link

feat: Open AI Recipe Scraper #3690

Closed michael-genson closed 3 weeks ago

michael-genson commented 4 weeks ago

What type of PR is this?

(REQUIRED)

What this PR does / why we need it:

(REQUIRED)

This PR adds a secondary recipe scraper strategy after it fails to scrape a site normally (using the recipe scrapers package). It actually still uses the recipe scrapers package, but replaces the JSON extraction with OpenAI (that way all of our cleaning and validating logic is reused). OpenAI takes the raw* HTML, then formats it into the recipe JSON-LD, which is then fed into recipe scrapers.

*we actually do some processing upfront to reduce the amount of data being sent to OpenAI, because if we just send the entire HTML body we quickly go over sane token counts

I also added an option to the in-app debugger to force it to try parsing with OpenAI (if OpenAI is enabled). I also did some refactoring to make extending the prompts/models easier. Even though I ended out not really using it in this PR, I left the changes because I think it's better organized.

Which issue(s) this PR fixes:

(REQUIRED)

Discussed here: https://github.com/mealie-recipes/mealie/discussions/3200

Testing

(fill-in or delete this section)

I tried a bunch of recipes from closed issues where scraping failed due to the site not being supported by recipe-scrapers. In every instance where there was actually scrapable data it worked great. It only fails when we can't actually get the site contents at all (obviously).