lexeme-dev / core

A project to develop novel algorithms and analysis techniques for legal research
4 stars 0 forks source link

Add opinion text into database #7

Closed ProbablyFaiz closed 3 years ago

ProbablyFaiz commented 3 years ago

To do more analysis (e.g. grabbing descriptive parentheticals) on our case law, we need to introduce opinion text into our database.

One consideration: CourtListener's data includes opinion text in HTML form. Should we scrub it then store it, or leave the formatting and scrub as necessary when pulling from the database?

Storing a scrubbed version would really help in terms of performance and searching.

But keeping an HTML version keeps more flexibility open.

Storing both is also an option but the storage cost is pretty significant.