feat: OpenAI Ingredient Parsing

michael-genson commented 4 months ago

What type of PR is this?

(REQUIRED)

feature

What this PR does / why we need it:

(REQUIRED)

This PR opens the door to implementing OpenAI in Mealie, and implements a new OpenAI ingredient parser. At a high level, this adds an OpenAI service that manages stored prompts and data injection to call the OpenAI API and receive a JSON response (which we then parse into a Pydantic model).

To enable OpenAI features, users need to include their OpenAI API key in the backend config (using the OPENAI_API_KEY env var). There are a few other configuration options to tweak performance vs cost (since the API isn't free).

Since OpenAI configuration is done via environment variables, this doesn't require any DB migrations.

The way this works is we have stored prompts which get sent to OpenAI to instruct it on what to do, i.e. "You are a bot designed to parse ingredients for recipes" (the actual prompt is much longer and goes into far more detail). It then sends a JSON list of inputs as the user message for it to process.

The OpenAI API supports returning its response in JSON format, which is perfect for FastAPI/Pydantic validation. I used Pydantic's BaseModel.model_dump_json() to inject the expected response schema into the prompt, which makes GPT always respond in a parsable format.

From there, implementing an interface is simple:

send inputs to OpenAI
receive predictable JSON string
parse into Pydantic model
push through whatever existing service we have)

Our OpenAI service handles the prompt injection, additional data injection (see below), and API handling, you just need to provide it the data and a description of how to use the data.

For the parser I opted to serialize our unit store and send it along with the rest of the prompt. This gives GPT some training data to say "you should expect to see these units". Originally I also included foods, but it didn't seem to help much at all (and adding the entire food store racks up API costs. This is configurable in the env settings: if you want to reduce costs, you can skip the optional data injection.

The OpenAI API isn't very fast when the responses are long. I took a bunch of measures to optimize this, but you can also split the ingredients into chunks and send multiple async requests (one for each chunk). This speeds up the parse time considerably, but costs more. The worker count is configurable in the env settings.

This PR also adds some QoL features on the frontend for parsing ingredients. Namely:

The last parser you chose is stored in user settings (so if you like the OpenAI parser you don't have to keep clicking on it)
While the backend is parsing ingredients, parse/save are disabled, and a loading animation runs (since OpenAI parsing takes some time, this was necessary, although it's nice for the other parsers too)

I've also hid the OpenAI ingredient parser if OpenAI isn't enabled (i.e. you haven't provided an API key).

Which issue(s) this PR fixes:

(REQUIRED)

N/A, though it has been discussed on and off

Special notes for your reviewer:

(fill-in or delete this section)

The prompts (this one and future ones) will likely go through a bunch of iterations before we're in that "sweet spot" of how to get the best results out of GPT. Ideally in the future it will need to be optimized for newer models (we may even decide to have different prompts for different models), but this is why I specifically included an env var for the OpenAI model to use (so that we aren't forced to keep up with the rapidly evolving AI space), sort of like pinning a package version.

This opens up some exciting possibilities in the future, such as importing strange recipe sources (unstructured data, OCR, etc.).

Testing

(fill-in or delete this section)

You need an OpenAI API key to properly test this, but I added a mocked test just to confirm it works fine as long as we get data from OpenAI.

boc-the-git commented 4 months ago

I've only skimmed it, this is not a definitive review!

Is it possible to make the endpoint configurable? I've not across all the details but I believe a lot of the self hosted LLM projects have "OpenAI compatible" endpoints. It would be great if we can easily support those as well, particularly given Mealie is in the the same self hosting space.

(Obviously, I'd be happy for any change here to be a subsequent PR)

michael-genson commented 4 months ago

It might be possible, but it would require a lot of extra work for a few reasons, which I've stayed away from in this PR:

OpenAI specifically supports a JSON response, which to my knowledge is unique to OpenAI, and we rely heavily on this
I used the OpenAI API library/client because it covers a lot of edge-cases, but it's tied to OpenAI. We'd have to switch to LangChain directly which would be a more significant lift, and would likely share very little code with the OpenAI implementation

jaasonw commented 4 months ago

Can you elaborate on what you mean by "JSON response unique to OpenAI"?

I believe what boc is saying is projects like ollama have an OpenAI-compatible API, allowing it to act as a drop-in replacement to the endpoint in the OpenAI library

michael-genson commented 4 months ago

Can you elaborate on what you mean by "JSON response unique to OpenAI"?

Specifically OpenAI's JSON mode: https://platform.openai.com/docs/guides/text-generation/json-mode

projects like ollama have an OpenAI-compatible API

I didn't realize that works even with the OpenAI library, that's super nice. Looks like we can just make the OpenAI base URL customizable and enable this. What I wanted to avoid was writing a custom client to interact with OpenAI (since it's a lot to maintain and really out of scope of Mealie)

eikaramba commented 4 months ago

if i understand this correctly the input is still the meta data from a website in a open recipe format. right? because i am parsing a websites html with gpt as not every website has the recipe in a structured format. i stumbled across multiple examples where the recipe is only available in the text/html and chatgpt needs to intelligently parse it to a json. works very good actually

michael-genson commented 4 months ago

the input is still the meta data from a website in a open recipe format

Correct, this PR is not for scraping websites and generating recipes. This is for recipes that have already been imported, but their ingredients are not yet parsed.

However, I do have plans to support alternative import methods using OpenAI, building off of the foundation of this PR. Theoretically we can fall back to parsing a website with OpenAI when recipe metadata isn't available.

michael-genson commented 4 months ago

There are a few different discussions that I think are great ways to apply this to other areas of Mealie later down the road

felixschndr commented 4 months ago

Does this required a paid OpenAI account? The default model is gpt-4o, however when setting this to something like gpt the calls should be free, right?

michael-genson commented 4 months ago

You may use any LLM that has an OpenAI-compatible API. For instance, see ollama posted above. You just need to specify your own base_url, api_key, and model name for your model.

I've only tested with gpt-4 (and its variants) so I can only confirm that those work, however it's fully configurable per-instance. I will say that with gpt-4 you blow through the free tier extremely quickly. I've built in some measures to reduce costs as well as give some configurability to trade off speed vs cost. With gpt-4o it seems to cost 5-10 cents per parsed recipe (with 2 workers and ~10 ingredients)

jaasonw commented 4 months ago

With gpt-4o it seems to cost 5-10 cents per parsed recipe (with 2 workers and ~10 ingredients)

Is there a reason to prefer a more powerful and more expensive model than 3.5-turbo ($0.50/1M tokens) as the default?

michael-genson commented 4 months ago

Short answer: No not really, but the default hardly matters when it doesn't work out of the box anyway; at a minimum you need to supply an API key, so there's nothing stopping you from also setting the model.

Longer answer: I've had a lot more success with GPT-4 when it comes to anything other than conversational interaction. GPT-3.5 is also a lot more moody when it comes to following prompts. GPT-4 is also much better at parsing non-english languages, which is particularly important for a parser that needs to understand grammar

mealie-recipes / mealie