Closed ragyabraham closed 1 month ago
@ragyabraham seems like when I update a row to change the cursor field and one of the metadata fields (primary key stays the same), it still doesn't delete the old points from qdrant.
datasource:
I have these 6 rows in bq:
I sync and get 6 points in qdrant:
Then I update one of the rows:
Then sync again and have 7 points in qdrant, with both old and new versions of the "#StayProductive" one:
Also note: my chunkingConfig is set to max_characters: 10
and new_after_n_chars: 10
but the page_content is definitely more than 10 characters. Not sure if that's us or unstructured. I'm using the proper unstructured cloud with our key.
Is your feature request related to a problem? Please describe.
When a chunking strategy is applied to a synced row, a single row results in several vector points. These points have been randomly assigned UUIDs as vector point indexes. To update these points, we need to be able to query and delete the existing points so as not to create duplicates.
Describe the solution you'd like
For every point associated with the original row, we will store the primary key in the
index
field. This field will be used to perform a vector search and delete prior to upsert operation. This will ensure that points are updated rather than duplicated.Additional context
We will use the
SearchType::ChunkedRow
in thevector-db-proxy
app to indicate that this operation is required to happen