hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
6.63k stars 240 forks source link

feature: Improve the LLM prompts for tag generation #505

Open Papierkorb opened 1 month ago

Papierkorb commented 1 month ago

Hello!

I just found this great piece of software! I was thrilled to see that I can use my own OpenAI endpoint and point it towards my Llama 3.1 8B endpoint.

Status Quo

I think the standard prompt as set in packages/shared/prompts.ts could require some tweaking. The default prompt doesn't work at all with my model (Which I think is a pretty commonly used one). This one does work great:

You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] }

Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in ${lang}.

${content}

I've only tested this prompt manually with my model. With this, my 0% success rate turns into a 100% success rate; Which surprises me, usually you need to handle the model giving an introduction like "Your requested JSON document: ..." by looking for the first { and last }.

Thus

Long story short: This PR adds the "TEXT_TAG_PROMPT" and "IMAGE_TAG_PROMPT" environment variables. If not set, then the already existing prompt is used. If it is set, it's used instead. The variables are then interpolated Like {{this}}, akin to Jinja2 templates (Which the project may adopt in the future?)

My template in this syntax

``` You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] } Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in {{lang}}. {{content}} ```

Not Great: As is, this change breaks AISettings.tsx, as it would now require the server configuration to work. For my tests I simply removed the calls to the buildPrompt functions - But this of course isn't acceptable for merging. I'd need a little help here on how we should approach this.

Now it works with Llama 3.1 8B, yay

![grafik](https://github.com/user-attachments/assets/3e5ed180-67d5-499f-a119-fb36f4f9b5c6)

MohamedBassem commented 1 month ago

Hey, thanks for taking the time, attempting to tweak the prompt and sending the PR. However, I intentionally don't want to lose control over the prompt. I want to keep the ability to change the format of the output in the future to include extra stuff, etc. If I let people replace the prompt completely, their future upgrades might not be backward compatible.

My plan is to give the people the option to "customize" the prompt and add extra rules. But I don't want people to completely replace it. The next release allows people to add new rules, but I'm also planning on letting the people modify the prebuilt rules.

Now, regarding the base prompt. I'd be happy to accept tweaking for it if you think your prompt achieves better result in llama, but we'd probably also want to test it on gpt models as well.

Papierkorb commented 1 month ago

So I've just pushed a commit (removing the initial one) which basically is just my new prompt, with slight changes to allow for the custom prompt feature. I've tried it with an example document in both gpt-4o-mini (via OpenAI API Playground, with json_object format) and my local llama3.1-8B model. Both models respond well. Could you try it if it works for you as well?

Example

You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] }

Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation.

- If you can't find tags you're satisfied with respond with an empty tags array.
- Write at most five tags.
- Write your tags in english.

URL: https://llm-business.eu/blog/about-transformer-models
Title: About Transformer Models
Description: 

Transformers allow the computer to think - But not really think of course. A transformer model is, like any AI model, basically a huge amount of matrices that are costly to compute.

While companies like OpenAI, Microsoft and Google offer access to their models, they're proprietary and can't be run locally at  the users site. The companies claim this is because of IP and because their models are simply too large to run efficiently on a users computer.

However, users have raised concerns about privacy. While the companies usually offer free access to their model, it comes at the cost of any text that has been put into them. There's no better data than knowing the most private information at no charge. This is why a niche of users has been running open models, as the likes of Llama, Mistral, or Qwen families.

gpt-4o-mini

{
  "tags": [
    "Transformer",
    "Artificial Intelligence",
    "Privacy",
    "Open Models",
    "Machine Learning"
  ]
}

llama 3.1 8B

{ "tags": ["Transformer", "Artificial Intelligence", "Machine Learning", "OpenAI", "Privacy"] }

What do you think?

kamtschatka commented 1 month ago

I tried it out with 34 links and it failed on 7 of them(also llama 3). This happened before, so a while back I did a PR that moved the whole "respond in json and this is how it should look like" to the end of the prompt and that fixed it. (I just tried the unmodified version version as well and all of them worked). So at this point, we can't merge this, as it makes things worse for existing users.

Papierkorb commented 1 month ago

I'm not saying that this should be merged outright. However, a 16% failure rate is already a lot better than the 100% failure rate I saw on my end.

I'm pretty sure that the issue lies in the code assuming that json_object mode working. If you're accessing OpenAI that's of course fine - But custom engines don't support it (At least I'm not aware of one). As solution, one usually does:

  1. Lenient parsing: Search for the first { and last } in the response and only parse those parts
  2. Retry: If it failed, re-try. Also helps with short-while networking issues

With this I usually see an almost perfect success rate. These changes wouldn't hurt OpenAI as json_object mode is still there. They'd help Ollama just as much as custom OpenAI endpoints.

kamtschatka commented 1 month ago

It makes things better for you, but makes things worse for everyone else though, so not very promising. Since you are knowledgeable in this area, how about you come to the discord channel (see readme.md, there is no link, only an image with a link) to discuss this further?