langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.71k stars 15.54k forks source link

Integrate Neo4j as a Graph Index, Vector Index, and as tools in the ecosystem #4625

Closed quillan86 closed 1 year ago

quillan86 commented 1 year ago

Feature request

There is a need for graph databases to be integrated in langchain. NetworkX isn't suitable for scalable graph databases that would be desired to be queried, particularly with tens of thousands or more nodes and edges. This is necessary for graph databases to compete with vector databases on the level for information extraction within langchain.

There is already a medium article and GitHub repo talking about one way in which this is implemented, but it would be ideal if something like this was integrated into langchain itself. This implementation also has Neo4j as embeddings as an option, which should be implemented as well.

Motivation

The Graph Index Creator and other small forms of graphs within LangChain use NetworkX which isn't scalable for production for full blown knowledge graphs on the size of the vector databases. I know that I have a particular need to use a graph database in production along with langchain due to a work level project.

Your contribution

Yes, I am willing to contribute. I haven't contributed to LangChain directly before but I am familiar with the source code investigating it. Would love to collaborate on what kind of framework/interface we would need to expand graph indexes with a similar scope as vector database indexes.

tomasonjo commented 1 year ago

I would also be willing to contribute, I would just need a bit of help to know where to put the code? The closest sections seems vector store, but Neo4j is not a vector store, so should it be a retrieval or a tool, or do we just pretend neo4j is a vector store?

quillan86 commented 1 year ago

There's already these folders of relevance:

So I think it's a matter of reformulating langchain.graphs to have a base.py et al similar to langchain.vectorstores. That's why I said interface - it would be the creation of a new general object like Vectorstore.

We can possibly store the vector embedding portion of Neo4j within the vectorstore one, though, but I'd need to look at the code based on the medium article.

I've already forked the repo and created a branch on my end for this although I haven't pushed changes yet.

tomasonjo commented 1 year ago

Yeah, I wouldn't really add vector search in Neo4j for starters, I would try to add Cypher search first, something like schema based cypher generation, that can be used on any graph:

https://medium.com/neo4j/generating-cypher-queries-with-chatgpt-4-on-any-graph-schema-a57d7082a7e7

quillan86 commented 1 year ago

Yeah that wouldn't be a priority atm (other than that was a feature of the agent tools I mentioned earlier) - cypher search would be the priority.

tomasonjo commented 1 year ago

I've started the PR, you can take a look

tomasonjo commented 1 year ago

This was added, so you could probably close this issue:

https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/examples/graph_cypher_qa.ipynb

v-almonacid commented 1 year ago

I've started the PR, you can take a look

@tomasonjo Quick question: Does GraphCypherQAChain works well for you? If yes, what version?

I tried the example in the docs with the current latest version (0.0.197) but it throws.

tomasonjo commented 1 year ago

Whats the error you are getting?

v-almonacid commented 1 year ago

Whats the error you are getting?

I get the issue now. The LLM simply doesn't respond with a plain Cypher statement, so naturally Neo4jGraph.query() fails. Maybe it's because I'm using an Azure LLM instance and it doesn't behave the same (?)

tomasonjo commented 1 year ago

I dont have access to azure llms, so I can't test it. You can ask the llm to wrap the statement in three backticks as the code can extract the statement then