Ok, a few observations, when looking to build RAG to answer questions like the following:
"How have workplace descrimination cases been argued?" Cases are very specific and don't appear to state much general knowledge. So the caselaw documents have to be processed in sets in order to answer questions like this. One approach might be to pull up all of the workplace descrimination cases, and then summarize their arguments.
"How have the courts interpreted statute RSA 68:4 II?" A semantic search doesn't do well on this one, since we are talking about a specific statute by name. It helps to pick out the "entity of interest" before querying the database, and full-text search would work better.
"What are the trends in case outcomes for specific types of cases (e.g., criminal, corporate, family law) over time?" These kinds of questions would take multiple steps: (1) find the cases, (2) order by time, (3) isolate outcomes, (4) pick out trends over time.
"Are certain judges more likely to dissent in specific types of cases?" This is more of a statistical analysis, and RAG may not be helpful here. More likely you would pre-process the cases and index by judge, dissent/concur, and case type.
"What are some relevant precedents for the case I'm working on?" This could be served by retrieval and summarization.
Here are a few patterns for legal research I've considered:
Statistical analysis, trend analysis. Use the whole corpus to find trends. LLMs could be used to preprocess the text into structured fields.
Statistical analysis of a subset. Select a subset of cases by type, judge, court, etc., and look for patterns in those. LLMs could preprocess the text.
Thematic analysis of a subset. This can be done with RAG. Retrieve the subset, then analyze the arguments for themes and patterns.
Steps:
Resources: