Open omikader opened 4 months ago
🤖
That sounds like a valuable addition to the Document
interface. It's great to see you taking the initiative to propose a solution. Adding the id
field to the Document
interface would indeed simplify the manipulation of vectors by ID. Your example clearly illustrates the potential benefits of this change. I encourage you to go ahead and create a pull request with your proposed changes. Your contribution will be appreciated by the community. Thank you for your initiative!
This has come up a few times - the issue is not every provider implements the concept of an id. We've got plans to refactor things around retrieval though (possibly introducing a new "Indexing" abstraction) and I would be in favor of adding it.
Hi, @omikader,
I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you proposed adding the id
field to the Document
interface to simplify vector manipulation by ID. I encouraged you to create a pull request with the proposed changes, and jacoblee93 expressed support for the addition, mentioning plans to refactor retrieval and introducing a new "Indexing" abstraction.
Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
Yes, I think this issue is still relevant 🙂
We are planning a refactor of the retrieval interfaces after 0.2!
I imagine most vectorstore providers have the concept of a unique identifier to represent a particular vector in the database. Given that, should that identifier explicitly be part of the definition for
Document
, e.g.This would make manipulating vectors by ID (e.g. update, delete) that were retrieved using the
similaritySearch
methods much simpler because we wouldn't need to indexid
as a separate metadata attributeAnd in some cases,
id
is autogenerated by Langchain which means it is unavailable onmetadata
making having it available on retrieval more importantLinking this similar issue that was closed before: https://github.com/langchain-ai/langchainjs/issues/2704