Lexcaliber is an ongoing project to develop novel algorithms and analysis techniques for legal research.
Our current efforts include:
See Eksombatchai et. al (2017), Huang et. al (2021), Sun et. al (2016) for literature that informs our current approaches
Our current work is focused on the federal appellate corpus (all circuit courts as well as the Supreme Court), with the aim of building systems that generalize to other jurisdictions.
This repository contains the bulk of the logic and infrastructure powering this project, as well as command-line and REST interfaces. See lexcaliber/explorer for more information about the prototype web interface we're building to demonstrate the technology.
The main thrust of our efforts so far has been in recommendation and discovery. Given some information (a relevant case or two; key words or phrases; a document in progress) from the user, we would like to examine the ~1,000,000 document strong federal appellate corpus and recommend relevant opinions which aid the user’s research or argument.
Our initial results have been very promising. The primary metric we are currently using is recall, the percentage of documents defined as relevant that we are successfully able to recommend. We adopt the measurement approach taken by Huang et. al (2021).
Our initial results are as follows:
For 20 cases after 5 trials each:
top1: 10.0%
top5: 21.0%
top20: 35.0%
Majority vote control for 20 cases after 5 trials each:
top1: 0.0%
top5: 0.0%
top20: 0.0%
If we restrict the cases to those with at least five neighbors (reasonable, considering that there are many orders/slip opinions with no or few citations), our results are even better:
For 20 cases after 5 trials each:
top1: 18.0%
top5: 30.0%
top20: 47.0%
Majority vote control for 20 cases after 5 trials each:
top1: 0.0%
top5: 0.0%
top20: 0.0%
These results are comparable to Huang et. al. 2021 in light of the much larger federal appellate corpus and, in our view, portend significantly more potential to generalize to other jurisdictions. We further expect these results to improve once we consider textual citation context as part of our recommendation computation.
alembic upgrade head
. Make sure you have a username in .env.pip install --editable .
Run lxc --help
for a list of all commands.lxc data download
with your desired jurisdictions.lxc server run
Bonus: Run git config blame.ignoreRevsFile .git-blame-ignore-revs
so your git blame
doesn't catch our reformatting commits.
alembic upgrade head
if your database schema is out of date.