langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.05k stars 14.65k forks source link

DOC: SQLRecordManager documentation does not exist #17088

Closed jamescadd closed 4 weeks ago

jamescadd commented 7 months ago

Issue with current documentation:

Please write docs for SQLRecordManager

Idea or request for content:

dosubot[bot] commented 7 months ago

🤖

Hey @jamescadd, great to see you diving into another part of the LangChain framework! How's everything going on your end?

Sure, I can provide some information on the SQLRecordManager in the LangChain repository.

The SQLRecordManager is a SQL Alchemy based implementation of the record manager. It uses SQLAlchemy to track upserted records. Currently, it only works with SQLite but should be adaptable to other SQL implementations with minimal effort.

Supported Databases:

Here are the methods of SQLRecordManager:

  1. __init__: Initializes the SQLRecordManager. It requires a namespace and optionally an engine, db_url, engine_kwargs, and async_mode.

  2. create_schema: Creates the database schema. Not supported for async engines.

  3. acreate_schema: Asynchronously creates the database schema. Not supported for sync engines.

  4. _make_session: Creates a session and closes it after use. Not supported for async engines.

  5. _amake_session: Asynchronously creates a session and closes it after use. Not supported for sync engines.

  6. get_time: Gets the current server time as a timestamp. It's critical that time is obtained from the server since we want a monotonic clock.

  7. aget_time: Asynchronously gets the current server time as a timestamp.

  8. update: Upserts records into the SQLite database.

  9. aupdate: Asynchronously upserts records into the SQLite database.

  10. exists: Checks if the given keys exist in the SQLite database.

  11. aexists: Asynchronously checks if the given keys exist in the SQLite database.

  12. list_keys: Lists records in the SQLite database based on the provided date range.

  13. adelete_keys: Asynchronously deletes records from the SQLite database.

For the __init__ method, here are the parameters:

  1. namespace: The namespace associated with this record manager.

Optional parameters:

  1. engine: An already existing SQL Alchemy engine. Default is None.
  2. db_url: A database connection string used to create an SQL Alchemy engine. Default is None.
  3. engine_kwargs: Additional keyword arguments to be passed when creating the engine. Default is an empty dictionary.
  4. async_mode: Whether to create an async engine. Driver should support async operations. It only applies if db_url is provided. Default is False.

Here is the method signature:

def __init__(
    self,
    namespace: str,
    *,
    engine: Optional[Union[Engine, AsyncEngine]] = None,
    db_url: Union[None, str, URL] = None,
    engine_kwargs: Optional[Dict[str, Any]] = None,
    async_mode: bool = False,
) -> None:

This information is found in the method's docstring and signature. You can find more details in the SQLRecordManager source code.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

PennlaineChu commented 6 months ago

@jamescadd Hi, has the bot solved your issue? I found some usage examples in the index function in this documentation. This explains quite well of the SQL record manager. Also, most SQL Alchemy supported databases works, as in this link. If you wish to have a documentation created, could you guide me on where to add the documentation? I would be willing to do the documentation if needed!!

jamescadd commented 6 months ago

@PennlaineChu Thank you, I would love to see the documentation created! Are you asking where in the langchain docs it belongs? You could use https://python.langchain.com/docs/modules/data_connection/sqlrecordmanager if it matches your taxonomy. There are several points that documentation could help clarify, here are some examples - though this is by no means exhaustive:

  1. Which databases are supported? You say most SQL Alchemy supported databases work, but dosu-bot above in this issue says only SQL Lite is supported. Which is correct?
  2. What's the purpose of the namespace? Can there be multiple namespaces in the same SQLRecordManager? Please provide an example, such as use with multiple vector stores that each have different sets of data.
  3. Can I retrieve the Documents from the SQLRecordManager? If so how?
  4. What are the indexes referenced in the above functions?
  5. Can I create my own record manager? Can you provide an example of another record manager?
dosubot[bot] commented 6 months ago

🤖

Thank you for your request. I understand that you're looking for comprehensive documentation for the SQLRecordManager in the LangChain repository. Here's what I found based on the available information:

Supported Databases: Currently, SQLRecordManager is primarily designed to work with SQLite. However, it's built on SQLAlchemy, which means it should theoretically support any database that SQLAlchemy can interface with. This includes, but is not limited to, PostgreSQL, MySQL, and Oracle. Please note that while the underlying support may exist, there may be untested edge cases or specific implementation details that could cause issues with databases other than SQLite.

Methods: Here's a brief overview of the methods in SQLRecordManager:

  1. __init__: Initializes the SQLRecordManager. You should specify either a db_url to create an engine or provide an existing engine.
  2. create_schema and acreate_schema: Create the database schema.
  3. _make_session and _amake_session: Create a session and close it after use.
  4. get_time and aget_time: Get the current server time as a timestamp.
  5. update and aupdate: Upsert records into the SQLite database.
  6. exists and aexists: Check if the given keys exist in the SQLite database.
  7. list_keys: List records in the SQLite database based on the provided date range.
  8. adelete_keys: Delete records from the SQLite database.

Namespaces: The namespace is associated with the record manager and is used to differentiate between different sets of records. It is a required parameter when initializing the SQLRecordManager.

Retrieval of Documents: Documents (or records) can be retrieved using the exists or aexists methods, which check if the given keys exist in the SQLite database. The list_keys method can be used to list records in the SQLite database based on the provided date range.

Indexes: Indexes are used in the UpsertionRecord table for the uuid, key, namespace, and updated_at fields to improve the speed of data retrieval operations.

Creating a Custom Record Manager: While it's theoretically possible to extend the SQLRecordManager class to create a custom record manager, the specifics of how to do this would depend on the requirements of the custom manager. Unfortunately, I don't have any examples or further information on this topic at the moment.

Please note that while I strive to provide accurate and up-to-date information, there may be additional details or nuances that are not covered in this summary. I recommend referring to the source code and any associated comments for the most complete and accurate information.

For the official documentation, the issue is still open and pending a response from the LangChain team or the original poster to confirm if the issue is still relevant. You can follow the progress of this issue here.

I hope this information is helpful. If you have any further questions or need clarification on any points, please don't hesitate to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
jamescadd commented 6 months ago

@PennlaineChu hi, any updates on the docs you can share?

jamescadd commented 6 months ago

@PennlaineChu hey, checking in there's any update.

mescanne commented 5 months ago

Hello -

I'd also be interested in the evolution of SQL Record Manager. I've been successfully using it with postgres, and I'm interested in bringing it to another database. (Spanner)

As I do that, I'm noting some oddities in the schema. The uuid column does not seem needed, and the indexes seem to be not thought through (in terms of what's needed). Spanner, for example, is easiest if we simply use a primary key of (key, namespace) as a way to have a uniqueness constraint. I'll need to investigate sqlite/postgres.

Thanks

hvassard commented 4 months ago

Hello, any news about this ? I'd love to see this amazing feature in the python docs :) For the moment I do :

from langchain.indexes import SQLRecordManager
help(SQLRecordManager)

By the way, the "SQLRecordManager" and "index" API link of this documentation page (as seen below) is a 404 error.

image

cesto93 commented 4 months ago

I would also like the documentation of SQLRecordManager