run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.39k stars 4.68k forks source link

[Feature Request]: Neo4jGraphstore and Neo4jVectorstore should expose the python driver as a parameter to enable users to pass in their own neo4j driver instances #13145

Open Sauti-AI opened 2 months ago

Sauti-AI commented 2 months ago

Feature Description

Currently, the code base does not allow a user of these abstractions to pass in a driver: class Neo4jGraphStore(GraphStore): def init( self, username: str, password: str, url: str, database: str = "neo4j", node_label: str = "Entity", **kwargs: Any, ) -> None: self.node_label = node_label self._driver = neo4j.GraphDatabase.driver(url, auth=(username, password)) self._database = database self.schema = "" self.structured_schema: Dict[str, Any] = {}

Verify connection

    try:
        with self._driver as driver:
            driver.verify_connectivity()
    except neo4j.exceptions.ServiceUnavailable:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the url is correct"
        )
    except neo4j.exceptions.AuthError:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the username and password are correct"
        )

This could instead be: class Neo4jGraphStore(GraphStore): def init( self, database: str = "neo4j", node_label: str = "Entity", username: Optional[str] = None, password: Optional[str] = None, url: Optional[str] = None, driver: Optional[Any] = None, **kwargs: Any, ) -> None: self.node_label = node_label // No validations but just to illustrate the concept. self._driver = driver or neo4j.GraphDatabase.driver(url, auth=(username, password)) self._database = database self.schema = "" self.structured_schema: Dict[str, Any] = {}

Verify connection

    try:
        with self._driver as driver:
            driver.verify_connectivity()
    except neo4j.exceptions.ServiceUnavailable:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the url is correct"
        )
    except neo4j.exceptions.AuthError:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the username and password are correct"
        )

I observed the same issue for the Neo4jVectorStore:

class Neo4jVectorStore(VectorStore): """Neo4j Vector Store.

Examples:
    `pip install llama-index-vector-stores-neo4jvector`

    ```python
    from llama_index.vector_stores.neo4jvector import Neo4jVectorStore

    username = "neo4j"
    password = "pleaseletmein"
    url = "bolt://localhost:7687"
    embed_dim = 1536

    neo4j_vector = Neo4jVectorStore(username, password, url, embed_dim)
    ```
"""

stores_text: bool = True
flat_metadata = True

def __init__(
    self,
    username: str,
    password: str,
    url: str,
    embedding_dimension: int,
    database: str = "neo4j",
    index_name: str = "vector",
    keyword_index_name: str = "keyword",
    node_label: str = "Chunk",
    embedding_node_property: str = "embedding",
    text_node_property: str = "text",
    distance_strategy: str = "cosine",
    hybrid_search: bool = False,
    retrieval_query: str = "",
    **kwargs: Any,
) -> None:
    if distance_strategy not in ["cosine", "euclidean"]:
        raise ValueError("distance_strategy must be either 'euclidean' or 'cosine'")

    self._driver = neo4j.GraphDatabase.driver(url, auth=(username, password))
    self._database = database

    # Verify connection
    try:
        self._driver.verify_connectivity()
    except neo4j.exceptions.ServiceUnavailable:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the url is correct"
        )
    except neo4j.exceptions.AuthError:
        raise ValueError(
            "Could not connect to Neo4j database. "
            "Please ensure that the username and password are correct"
        )

Reason

As it is right now, the driver instance is tightly coupled to the node_label. If I am to use this abstraction, for each node_label, I would need to create a new Neo4jGraphStore object. Is this correct?

Value of Feature

Remove coupling between the neo4j driver and the node_label and make the abstraction more adaptable.

gich2009 commented 2 months ago

I have observed this too. Is there a reason why the driver is not exposed as a parameter?