mscarey / legislice

API client for fetching and comparing passages from legislation
https://legislice.readthedocs.io/
Other
7 stars 2 forks source link

Jurism imports #16

Open fbennett opened 3 years ago

fbennett commented 3 years ago

Jurism is a reference manager that supports legal resources, and implements the "Bluebook" rules for human-readable citations. If LegiSlice were to offer citation data in a format digestible by Jurism (CSL-M JSON, or possibly MODS), it would open interesting paths for integration in user-level services that leverage both tools. If that sounds interesting, I can help with the data structure.

mscarey commented 3 years ago

Sorry I missed the notification on this, @fbennett! I would have responded sooner. Yeah, I think it would be great to export a citation that Jurism can read. I have questions about how to format a citation exported from Legislice to Jurism. I'm sure I'll have trouble understanding how to import from Jurism too, but I should try to understand exporting first.

I'm having trouble understanding the explanation of CSL-M JSON in the citeproc-js docs. The "Citations" structure seems most relevant. But I'm unclear what to put for some of the fields. Here's the example I'm looking at:

{
    id:"item1",
    locator: 123,
    label: "page",
    prefix: "See ",
    suffix: " (arguing that X is Y)"
}

The citeproc-js docs say the id field needs to uniquely identify the resource. Does that mean the id should be a URI with a namespace, or a hash, or something else? If the Legislice object being cited to represents a quotation from a dated version of a subdivision of a statute, does the id need to uniquely identify the subdivision, or the specific dated version of the subdivision, or the specific quotation from the dated version?

The citeproc-js docs include a type field that can be set to "legislation", but where does it go? Does a "citation" fit inside an "item"?

locator is supposed to identify "a page number or other pinpoint location or range within the resource". Can it be a path identifier like '/us/usc/t17/s103/a'?

For the label field, can I use label names like "subsection" and "clause"?

The example provides a prefix and suffix for the quoted phrase. Can I provide a start index and end index instead?

For an example, here's the JSON from the Legislice documentation representing three text passages. Each 'content' field contains the full text of the corresponding provision, but then the selection fields narrow down the range of text actually considered "selected". Can you show me an example of what CSL-M JSON should be generated for this example? Are any necessary fields missing to generate CSL-M JSON from this example?

{'start_date': '2013-07-18',
'children': [{'start_date': '2013-07-18',
'children': [],
'end_date': None,
'text_version': {'content': 'The subject matter of copyright as specified by section 102 includes compilations and derivative works, but protection for a work employing preexisting material in which copyright subsists does not extend to any part of the work in which such material has been used unlawfully.'},
'node': '/us/usc/t17/s103/a',
'anchors': [],
'selection': [{'end': 277, 'start': 0}],
'heading': ''},
{'start_date': '2013-07-18',
'children': [],
'end_date': None,
'text_version': {'content': 'The copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material. The copyright in such work is independent of, and does not affect or enlarge the scope, duration, ownership, or subsistence of, any copyright protection in the preexisting material.'},
'node': '/us/usc/t17/s103/b',
'anchors': [],
'selection': [{'end': 300, 'start': 256},
    { 'end': 437, 'start': 384}],
'heading': ''}],
'end_date': None,
'text_version': None,
'node': '/us/usc/t17/s103',
'anchors': [],
'selection': [],
'heading': 'Subject matter of copyright: Compilations and derivative works'}
fbennett commented 3 years ago

Great! It may take some back-and-forth to sort out how Jurism (or more generally automated citations with the citeproc-js processor) would fit into LegiSlice workflows. I'll start with the data sample, post how it would be represented in CSL-M JSON (CSL-M is the variant of the vanilla CSL style language used by Jurism), and follow with some open-ended questions.

It looks like that's a Python structure, I took the liberty of refactoring it to JSON (further comments below):

{
  "start_date": "2013-07-18",
  "children": [
    {
      "start_date": "2013-07-18",
      "children": [],
      "end_date": null,
      "text_version": {
        "content": "The subject matter of copyright as specified by section 102 includes compilations and derivative works, but protection for a work employing preexisting material in which copyright subsists does not extend to any part of the work in which such material has been used unlawfully."
      },
      "node": "/us/usc/t17/s103/a",
      "anchors": [],
      "selection": [
        {
          "end": 277,
          "start": 0
        }
      ],
      "heading": ""
    },
    {
      "start_date": "2013-07-18",
      "children": [],
      "end_date": null,
      "text_version": {
        "content": "The copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material. The copyright in such work is independent of, and does not affect or enlarge the scope, duration, ownership, or subsistence of, any copyright protection in the preexisting material."
      },
      "node": "/us/usc/t17/s103/b",
      "anchors": [],
      "selection": [
        {
          "end": 300,
          "start": 256
        },
        {
          "end": 437,
          "start": 384
        }
      ],
      "heading": ""
    }
  ],
  "end_date": null,
  "text_version": null,
  "node": "/us/usc/t17/s103",
  "anchors": [],
  "selection": [],
  "heading": "Subject matter of copyright: Compilations and derivative works"
}

Jurism, like Zotero, harvests items from HTML views, or from structured metadata embedded in a page. The most reliable structured format ATM is CSL-M JSON. 17 USC § 103 (1974) as revised Jul. 18, 2013 would be expressed like this:

[
    {
        "type": "legislation",
        "multi": {
            "main": {},
            "_keys": {}
        },
        "container-title": "U.S. Code",
        "section": "sec. 103",
        "volume": "17",
        "jurisdiction": "us",
        "issued": {
            "date-parts": [
                [
                    "1974",
                    10,
                    19
                ]
            ]
        },
        "event-date": {
            "date-parts": [
                [
                    "2013",
                    7,
                    18
                ]
            ]
        }
    }
]

In Jurism, that object would import to this: Screen Shot 2020-12-09 at 20 08 06 I guess the initial question is over how the interaction between a LegiSlice-driven application and Jurism would work. If the surface of it is a web page, it's just a matter of encoding the JSON and including it in the page (and setting up a translator in Jurism to decode the object and import when the user requests it). If LegiSlice appears in an API. and the API supplies the CSL-M JSON as part of the return, it would just be a matter of documenting how to access the object, so that a web application drawing on the API can deliver the object to a Jurism connected to the visiting browser on request.

Some of that ... might not be clear on first reading. Let me know if it needs unpacking.

mscarey commented 3 years ago

A few things I'm not sure about:

My API's data includes dates of different versions of USC provisions after the first USLM version of the USC in 2013, but it doesn't include earlier enactment or amendment dates.

The enactment date for a bill can be much earlier than the time its USC section came into existence, especially if the provision is transferred and renumbered. (one example: 2 USC 5121 provides the enactment date 1949-01-19 for its source bill in its sourceCredit field in the published USC, but I think 2 USC 5121 only came into existence on 2014-01-16 when Title 2 was renumbered.)

Is it really possible to link every USC section to exactly one original enacting bill? Would I need a separate data model for bills that would exist alongside the data model for code sections? Is there a reliable dataset that provides these dates (or other bill data) for every section, or would I need to get them by parsing the sourceCredit fields in the published USC XML files? So far I haven't parsed those at all because they don't seem consistent enough in their structure.

mscarey commented 3 years ago

Hi @fbennett, I made a first pass on this feature in the master branch. I added an Enactment.csl_json() method and updated the user guide.

While adding the feature I had a few more questions.