simonw/llm-cluster: LLM plugin for clustering embeddings
Snippet
"llm-cluster
LLM plugin for clustering embeddings
Background on this project: Clustering with llm-cluster.
Installation
Install this plugin in the same environment as LLM.
llm install llm-cluster
Usage
The plugin adds a new command, llm cluster. This command takes the name of an embedding collection and the number of clusters to return.
Content
LLM plugin for clustering embeddings
Background on this project: Clustering with llm-cluster.
Installation
Install this plugin in the same environment as LLM.
llm install llm-cluster
Usage
The plugin adds a new command, llm cluster. This command takes the name of an embedding collection and the number of clusters to return.
First, use paginate-json and jq to populate a collection. In this case we are embedding the title and body of every issue in the llm repository, and storing the result in a issues.db database:
The content displayed is truncated to 100 characters. Pass --truncate 0 to disable truncation, or --truncate X to truncate to X characters.
Generating summaries for each cluster
The --summary flag will cause the plugin to generate a summary for each cluster, by passing the content of the items (truncated according to the --truncate option) through a prompt to a Large Language Model.
This feature is still experimental. You should experiment with custom prompts to improve the quality of your summaries.
Since this can run a large amount of text through a LLM this can be expensive, depending on which model you are using.
This feature only works for embeddings that have had their associated content stored in the database using the --store flag.
simonw/llm-cluster: LLM plugin for clustering embeddings
Snippet
Content
LLM plugin for clustering embeddings
Background on this project: Clustering with llm-cluster.
Installation
Install this plugin in the same environment as LLM.
Usage
The plugin adds a new command,
llm cluster
. This command takes the name of an embedding collection and the number of clusters to return.First, use
paginate-json
andjq
to populate a collection. In this case we are embedding the title and body of every issue in thellm
repository, and storing the result in aissues.db
database:The
--store
flag causes the content to be stored in the database along with the embedding vectors.Now we can cluster those embeddings into 10 groups:
If you omit the
-d
option the default embeddings database will be used.The output should look something like this (truncated):
The content displayed is truncated to 100 characters. Pass
--truncate 0
to disable truncation, or--truncate X
to truncate to X characters.Generating summaries for each cluster
The
--summary
flag will cause the plugin to generate a summary for each cluster, by passing the content of the items (truncated according to the--truncate
option) through a prompt to a Large Language Model.This feature is still experimental. You should experiment with custom prompts to improve the quality of your summaries.
Since this can run a large amount of text through a LLM this can be expensive, depending on which model you are using.
This feature only works for embeddings that have had their associated content stored in the database using the
--store
flag.You can use it like this:
This uses the default prompt and the default model.
Partial example output:
To use a different model, e.g. GPT-4, pass the
--model
option:The default prompt used is:
To use a custom prompt, pass
--prompt
:A
"summary"
key will be added to each cluster, containing the generated summary.Suggested labels
None