harvard-lil / h2o

H2O is a web app for creating and reading open educational resources, primarily in the legal field
https://opencasebook.org
GNU Affero General Public License v3.0
35 stars 30 forks source link

add first draft of CL search #2051

Closed teovin closed 2 months ago

teovin commented 3 months ago

This is a WIP PR for the CL case XML -> HTML conversion integration.

Things that were done:

A few bug fixes were made:

image

Things to consider:

Sample converted legal doc (chopped):

Screenshot 2024-06-12 at 10 26 44 AM

This is how it would look like if the elements I mentioned above weren't set to display: none.

Screenshot 2024-06-12 at 10 43 33 AM

A case that both CAP and CL return, and this is how they look like when imported (both chopped):

CAP (with display: none removed from .case-text .syllabus):

CAP

CourtListener (with display: none removed from elements in headmatter):

CL

teovin commented 2 months ago

I just did a quick pass for code style, and LGTM! Left two tiny suggestions 🙂

I also took the liberty of adding Jack as a reviewer, who I expect might be more equipped than me to address your more detailed questions 🙂

Thank you Becky, I addressed your suggestions in my last commit. And I will work on any changes that Jack might suggest, especially those around the questions I had as you mentioned.

jcushman commented 2 months ago

I noticed we are hiding some elements like .parties, .decisiondate and .docketnumber in case-text class. What's the reasoning behind this?

We want to render the top part of the head matter ourselves, rather than use the info printed in the book -- that lets us provide more consistent formatting between cases published in different books. Check out cap_header.html for where that's done. My guess is you have to adapt that business logic to also work with CL.

So some fields are hidden because the custom header makes them redundant. I wasn't part of this, but I'm guessing we're hiding other fields like syllabus and parties simply for user preference. As long as we're rendering the same as cases fetched from the CAP API, let's not revisit that decision for now.

jcushman commented 2 months ago

I haven't looked if you're doing this yet -- I think we'll want to record which courtlistener field was used to populate the case. For example I'm pretty sure if we do need footnote_regexes, we only need it if xml_harvard was the source.

jcushman commented 2 months ago

This looks great -- I think with updates it'll be good to test on stage.

jcushman commented 2 months ago

... but we might want a feature flag since xml conversion isn't ready yet.

teovin commented 2 months ago

We want to render the top part of the head matter ourselves, rather than use the info printed in the book -- that lets us provide more consistent formatting between cases published in different books. Check out cap_header.html for where that's done. My guess is you have to adapt that business logic to also work with CL.

So some fields are hidden because the custom header makes them redundant. I wasn't part of this, but I'm guessing we're hiding other fields like syllabus and parties simply for user preference. As long as we're rendering the same as cases fetched from the CAP API, let's not revisit that decision for now.

I added a template for court listener modeling it after cap_header.html. One change I made to both was to remove the div with legal_doc.get_title as get_title method didn't exist, and so it wasn't rendering anything.