masci / banks

LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. It allows attaching metadata to prompts to ease their management, and versioning is first-class citizen. Banks provides ways to store prompts on disk along with their metadata.
MIT License
63 stars 5 forks source link

Design decisions #24

Open alex-stoica opened 4 days ago

alex-stoica commented 4 days ago

Hey there, quite an interesting project!

I have a few questions about the design choices:

masci commented 5 hours ago

Hey there, quite an interesting project!

Hey thanks! Sorry for the delay, this wasn't a simple one to answer 😄

Before I get to the answers, I'd like to stress that Banks was born out of curiosity to see how much of the LLM stack I could push down to the prompt template itself. Some of the design makes sense and comes from real use cases, but some features are admittedly more "artificial" and comes out of curiosity mainly.

I have a few questions about the design choices:

* It seems like everything is being managed within the prompt template itself. How do you envision distinguishing between a prompt change and a version change (e.g., adjusting temperature from 0.6 to 0.61)?

Good question. This mostly boils down to define what we are versioning:

  • The same prompt template could be "rendered" into different prompts depending on what we use to replace the template tags. Should we version the fina prompt or the original template?
  • Should we include LLM params into the prompt template, and version the whole "package"? See PromptDX for an example, I love their "front matter" approach.

To avoid answer those questions 😅 I left versioning in Banks wide open, it's just a string attached to the Prompt class and not the template text, so it only exists within Banks and only if you store the prompt objects. The "outer" software relying on Banks is supposed to make the decision, using the version and the metadata fields.

* Do you find it effective to place all elements into the prompt? For instance, with structured outputs or agent-based tasks, do you envision these as part of the prompt too? It seems like it could become a large, all-encompassing component—any thoughts on that?

There is a concrete risk that one ends up pushing the whole LLM stack into the prompt, and I don't think it's sustainable. I like the mantra "complexity has to live somewhere", and in this case the template language would need to dramatically increase in complexity, and I don't think it's a good idea to replace Python logic with Jinja logic.

You can see a glimpse of this with the function calling feature. I thought that was an interesting perspective to explore because function calling is such a mess in every LLM library and framework around and I was curious to see if I could do better. I think the {{ my_function | tool }} tag is pretty cool but it hides so much of what really happens that I don't think it can scale beyond a prototype unless you cargo-load the template language with features.

* For configurations with multiple Jinja variables, it looks like an "adapter" might be necessary to extract key parameters for use with other LLM chat or completion wrappers. How do you see this being handled?

I decided to completely offload this effort to LiteLLM to keep the API simple. In order to support multiple "providers", I think there are two ways:

Hope this answers your questions, let me know if you have any follow up!