Steps:
Taking a cue from this paper, the steps would include:
Come up with a taxonomy or list of categories of entities involved in case law (eg., plaintiff, defendant, rule, issue, etc.). Ask an LLM to draft this.
For each chunk of text, extract the entities that fall into the taxonomy above.
Generate triples for how each entity interacts with each other.
Compose subgraphs into a larger graph (eg., using networkx.compose).
Create embeddings for each node in the graph.
Merge nodes that are highly similar, using the highest-connected node label as the merged node label.
Clean up the graph by finding connected components and removing those below a critical size threshold.
For community analysis, group graphs into communities. The paper uses the Girvan-Newman algorithm, which detects communities by progressively removing nodes with the highest "betweenness centrality" -- those that participate in the highest number of shortest-paths between other nodes.
For question answering, extract entities from the question and use the graph to find relevant cases. Retrieve the case text and answer the question.
Implement Graph RAG against the NH Caselaw dataset.
Links:
Steps: Taking a cue from this paper, the steps would include: