Closed jkomoros closed 3 months ago
Yeah, the number of vectors in production is 100k, instead of the expected ~15k, 10x larger than expected, likely due to these extra reembeddings.
gulp configure-qdrant
to work againgulp configure-qdrant
and also npm run generate:config
and npm run generate:env
My guess is that it's in reindexCardEmbeddings, it's bulk-fetching all of the items, but the cardsInfo is coming back incorreclty. Looks like it gets the content field but not the card_id field?
BTW this "lots of duplicates of the same card content and embedding" is likely why the semanticSort in #688 was finding so many non-existent embeddings? Maybe? Because it was fetching a random embedding for that cardID?
The bug that has been fixed was leading to a lot of duplicate embeddings being stored, every time reindexCardEmbeddings was run, which was on every deploy.
This is now fixed and deployed into production
Running reindexCardEmbeddings appears to add embeddings even ones that should already be in the store?
If you look at any given point ID and search by similar you'll find a huge number of duplicates with the same card id and version.