Try running a local model - Githubissues

tkellogg / fossil

A mastodon client optimized for reading, with an AI-enabled algorithm for displaying posts

https://www.fossil-social.com

65 stars 8 forks source link

Try running a local model #15

Open tkellogg opened 6 months ago

tkellogg commented 6 months ago

All the code is in place, although you might have to fix it up.

Follow the llm instructions for installing models.
Change the select box on the /settings page to select the new model
Verify that it's not using OpenAI (idk, maybe remove the API key and see if it still works)

You probably don't need to use the same llm installation as what fossil uses, since all state and models are written to your $HOME dir.

Write about it — here, or even send a pull request updating the README

golfinq commented 6 months ago

It doesn't work. Here is what I attempted for purely local usage:

In order for it to run at all I had to add OPENAI_KEY= in my .env file
I installed offline models with poetry run llm install llm-sentence-transformers and poetry run llm install llm-gpt4all
The embedding model name of sentence-transformers/all-mpnet-base-v2 wasn't propagating from the settings into the code correctly, so I manually edited core.py to have emb_model = llm.get_embedding_model("sentence-transformers/all-mpnet-base-v2")
While this did produce embeddings, it did not produce output for the local webpage.

golfinq commented 6 months ago

I cracked open the DB and found that while embeddings were placed into it; the cluster column was all NULL in the toots table and algorithm_spec and algorithm were both NULL in the sessions table which explains why nothing was rendered.

golfinq commented 6 months ago

I found the bug, you use GPT-3.5 to produce a summary and because I don't have an OpenAPI token it fails. So everything works right up until the end. In both cases of using config....name the inputs weren't propagated into the code. I changed the model selection line in topic_cluster.py to model = llm.get_model("gpt4all-falcon-q4_0") and it seems to be summarizing at about a rate between 10 to 60 seconds per cluster.

golfinq commented 6 months ago

tkellogg commented 6 months ago

that seems like you got it all to work, just that the content was too big for the model, right? If so, that's something we can fix, although llm doesn't surface the context width, which makes it trickier.

You might have been working off an old commit. I fixed some issues (like still referencing OpenAI for summarizing) on saturday or early sunday. Maybe I'm still missing something.

Send a pull request with what you got, I can take care of merging what you've done. Regardless, you've given me a bit of a step-by-step so i don't have to lookup as much ;)

golfinq commented 6 months ago

I used the latest git branch for this. I am not sure what a potential PR would include, but there are small things I noticed while poking around like:

getting the config.. working
fixing small misspellings in the code
the settings should probably say "embedding model" and "summarizing model"
An "apply settings" button to commit to the database to remove ambiguity and potential bugs from doing it automatically
the README could use more detail like including the poetry install step and instructions on how to run purely local

I am not sure if the context width is a limitation of llm or the model.

tkellogg commented 6 months ago

I am not sure if the context width is a limitation of llm or the model.

I view this as a limitation of llm because all models have a finite context width and llm should report what that number is, imo. It's "model" interface should have an abstract method to return that info, so that the llm app can write guard code to work with this limitation.

I am not sure what a potential PR would include

All that would be valuable. Don't worry about cleaning it up. I'll merge the commit as-is, so you get contributor status, and then clean it up for you.