Closed AnthonyDasse closed 1 week ago
No one answered you so I will try...
github.com/pkoukk/tiktoken-go v 0.1.6
which is good only for older text-embedding-ada-002
embeddings. Newer embeddings models are handled since 0.1.7... You can't use text-embedding-3-large
by definition and results with text-embedding-3-small
seem suspicious to me. That's only my experience. I should have investigated further but ...
CREATE OR REPLACE FUNCTION public.find_closest_vector(input_vector VECTOR(1536), limit_results INT, filename VARCHAR, collection_name VARCHAR)
RETURNS TABLE(
doc VARCHAR,
similarity DOUBLE PRECISION
) AS $$
BEGIN
RETURN QUERY
SELECT
c.document,
(1 - (c.embedding <=> input_vector)) AS similarity
FROM
langchain_pg_embedding c
INNER JOIN langchain_pg_collection col ON c.collection_id = col.uuid
WHERE
col.name = collection_name AND
c.cmetadata ->> 'filename' = filename
ORDER BY
c.embedding <=> input_vector -- This operator calculates the cosine distance
LIMIT limit_results;
END;
$$ LANGUAGE plpgsql;
type VectorSearchResult struct {
Document pgtype.Text `db:"doc"`
Similarity pgtype.Float8 `db:"similarity"`
}
// VectorSearch queries the database for the closest vector to the given vector, with the specified limit and filename.
// It returns a slice of VectorSearchResult structs.
func VectorSearch(dbPool *pgxpool.Pool, vector *[]float32, limit int, filename string) []VectorSearchResult {
// Create a new vector using the given vector slice
v := pgvector.NewVector(*vector)
// Execute the query using the dbPool and the vector, limit, filename, and collection name as parameters
rows, err := dbPool.Query(ctx, "SELECT doc, similarity FROM public.find_closest_vector($1, $2, $3, $4)", v, limit, filename, PGCOLLECTION)
if err != nil {
log.Println("error while executing query - ", err)
}
// Collect the rows into a slice of VectorSearchResult structs
result, err := pgx.CollectRows(rows, pgx.RowToStructByNameLax[VectorSearchResult])
if err != nil {
log.Printf("CollectRows error: %s", err.Error())
}
// Return the result
return result
}
thank you @chew-z , i look that
Hello.
I think there are an issue with the function SimilaritySearch (with scoreThrehold option).
When I use the SimilaritySearch function with PgVector and I add the 'scoreThreshold' option to 0.80, I have no documents returned. If I remove the 'scoreThreshold' option, I have many documents returned with a score greater than 0.80.
According to my research and the github issues of the langhchain library, the problem comes from a confusion between the distance strategy.
See the issues:
I think the error is around this part of the code : langchaingo/vectorstores/pgvector
if scoreThreshold != 0 { whereQuerys = append(whereQuerys, fmt.Sprintf("data.distance < %f", 1-scoreThreshold)) }
If we look at the pgvector document :
the sql request should be like this : https://github.com/pgvector/pgvector