vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.41k stars 1.51k forks source link

Vector embeddings on logs #18801

Open jonathanpv opened 11 months ago

jonathanpv commented 11 months ago

Vector embeddings support

Vector should support an embedding-transform through VRL

The only thing we need would be a configuration of the embedding endpoints to use and the place to store the output

Why? Semantic search through data can be powered cheaply using vector embeddings, in order to be a step towards AI-powered monitoring we should support translating logs to vectors and adding it as a field.

Sample feature:

// we configure the embedding endpoint that accepts text and outputs a matrix in vector.toml
embedding_endpoint = https://api.openai.com/v1/embeddings
embedding_endpoints_api_key = "sk-..."

// sink, some data stores support vector search natively like pinecone, weaviate, etc
// perhaps we would need to support those sinks separately
// in vrl we just call it like so and it should pull api keys from vector.toml
.embedding = log_to_embdedding(.log)

Use Cases

as a user I can use natural language to search through my logs

the end result will allow users to have intelligent search through logs with natural language.

as a developer I can implement semantic search quickly with vector.dev

a better developer experience.

for example:

a search query like: "give me the failures from amazon in the last three hours" can output the most relevant logs

Attempted Solutions

No response

Proposal

I propose an investigation between the referenced services and see if this is a quick implementation or if it is not worth the investment / already supported but with a different customization.

References

This is an example endpoint that generates matrices based on text input, there are other one's but openAI is the most prevalent solution at the moment

This is a service specialized in vector search

Pinecone is also a popular vector store

User's can implement vector search using just a SQL database as well

Version

No response

jonathanpv commented 5 months ago

Related project: https://github.com/Anush008/fastembed-rs

jonathanpv commented 4 months ago

Value prop:

and then downstream someone's UI or search can be like

"do we have any errors recently in the hotpath?"