Open fbennett opened 4 years ago
Sorry I missed the notification on this, @fbennett! I would have responded sooner. Yeah, I think it would be great to export a citation that Jurism can read. I have questions about how to format a citation exported from Legislice to Jurism. I'm sure I'll have trouble understanding how to import from Jurism too, but I should try to understand exporting first.
I'm having trouble understanding the explanation of CSL-M JSON in the citeproc-js docs. The "Citations" structure seems most relevant. But I'm unclear what to put for some of the fields. Here's the example I'm looking at:
{
id:"item1",
locator: 123,
label: "page",
prefix: "See ",
suffix: " (arguing that X is Y)"
}
The citeproc-js docs say the id
field needs to uniquely identify the resource. Does that mean the id
should be a URI with a namespace, or a hash, or something else? If the Legislice object being cited to represents a quotation from a dated version of a subdivision of a statute, does the id need to uniquely identify the subdivision, or the specific dated version of the subdivision, or the specific quotation from the dated version?
The citeproc-js docs include a type
field that can be set to "legislation", but where does it go? Does a "citation" fit inside an "item"?
locator
is supposed to identify "a page number or other pinpoint location or range within the resource". Can it be a path identifier like '/us/usc/t17/s103/a'?
For the label
field, can I use label names like "subsection" and "clause"?
The example provides a prefix and suffix for the quoted phrase. Can I provide a start index and end index instead?
For an example, here's the JSON from the Legislice documentation representing three text passages. Each 'content' field contains the full text of the corresponding provision, but then the selection
fields narrow down the range of text actually considered "selected". Can you show me an example of what CSL-M JSON should be generated for this example? Are any necessary fields missing to generate CSL-M JSON from this example?
{'start_date': '2013-07-18',
'children': [{'start_date': '2013-07-18',
'children': [],
'end_date': None,
'text_version': {'content': 'The subject matter of copyright as specified by section 102 includes compilations and derivative works, but protection for a work employing preexisting material in which copyright subsists does not extend to any part of the work in which such material has been used unlawfully.'},
'node': '/us/usc/t17/s103/a',
'anchors': [],
'selection': [{'end': 277, 'start': 0}],
'heading': ''},
{'start_date': '2013-07-18',
'children': [],
'end_date': None,
'text_version': {'content': 'The copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material. The copyright in such work is independent of, and does not affect or enlarge the scope, duration, ownership, or subsistence of, any copyright protection in the preexisting material.'},
'node': '/us/usc/t17/s103/b',
'anchors': [],
'selection': [{'end': 300, 'start': 256},
{ 'end': 437, 'start': 384}],
'heading': ''}],
'end_date': None,
'text_version': None,
'node': '/us/usc/t17/s103',
'anchors': [],
'selection': [],
'heading': 'Subject matter of copyright: Compilations and derivative works'}
Great! It may take some back-and-forth to sort out how Jurism (or more generally automated citations with the citeproc-js
processor) would fit into LegiSlice workflows. I'll start with the data sample, post how it would be represented in CSL-M JSON (CSL-M is the variant of the vanilla CSL style language used by Jurism), and follow with some open-ended questions.
It looks like that's a Python structure, I took the liberty of refactoring it to JSON (further comments below):
{
"start_date": "2013-07-18",
"children": [
{
"start_date": "2013-07-18",
"children": [],
"end_date": null,
"text_version": {
"content": "The subject matter of copyright as specified by section 102 includes compilations and derivative works, but protection for a work employing preexisting material in which copyright subsists does not extend to any part of the work in which such material has been used unlawfully."
},
"node": "/us/usc/t17/s103/a",
"anchors": [],
"selection": [
{
"end": 277,
"start": 0
}
],
"heading": ""
},
{
"start_date": "2013-07-18",
"children": [],
"end_date": null,
"text_version": {
"content": "The copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material. The copyright in such work is independent of, and does not affect or enlarge the scope, duration, ownership, or subsistence of, any copyright protection in the preexisting material."
},
"node": "/us/usc/t17/s103/b",
"anchors": [],
"selection": [
{
"end": 300,
"start": 256
},
{
"end": 437,
"start": 384
}
],
"heading": ""
}
],
"end_date": null,
"text_version": null,
"node": "/us/usc/t17/s103",
"anchors": [],
"selection": [],
"heading": "Subject matter of copyright: Compilations and derivative works"
}
Jurism, like Zotero, harvests items from HTML views, or from structured metadata embedded in a page. The most reliable structured format ATM is CSL-M JSON. 17 USC § 103 (1974) as revised Jul. 18, 2013 would be expressed like this:
[
{
"type": "legislation",
"multi": {
"main": {},
"_keys": {}
},
"container-title": "U.S. Code",
"section": "sec. 103",
"volume": "17",
"jurisdiction": "us",
"issued": {
"date-parts": [
[
"1974",
10,
19
]
]
},
"event-date": {
"date-parts": [
[
"2013",
7,
18
]
]
}
}
]
In Jurism, that object would import to this: I guess the initial question is over how the interaction between a LegiSlice-driven application and Jurism would work. If the surface of it is a web page, it's just a matter of encoding the JSON and including it in the page (and setting up a translator in Jurism to decode the object and import when the user requests it). If LegiSlice appears in an API. and the API supplies the CSL-M JSON as part of the return, it would just be a matter of documenting how to access the object, so that a web application drawing on the API can deliver the object to a Jurism connected to the visiting browser on request.
Some of that ... might not be clear on first reading. Let me know if it needs unpacking.
A few things I'm not sure about:
My API's data includes dates of different versions of USC provisions after the first USLM version of the USC in 2013, but it doesn't include earlier enactment or amendment dates.
The enactment date for a bill can be much earlier than the time its USC section came into existence, especially if the provision is transferred and renumbered. (one example: 2 USC 5121 provides the enactment date 1949-01-19 for its source bill in its sourceCredit
field in the published USC, but I think 2 USC 5121 only came into existence on 2014-01-16 when Title 2 was renumbered.)
Is it really possible to link every USC section to exactly one original enacting bill? Would I need a separate data model for bills that would exist alongside the data model for code sections? Is there a reliable dataset that provides these dates (or other bill data) for every section, or would I need to get them by parsing the sourceCredit
fields in the published USC XML files? So far I haven't parsed those at all because they don't seem consistent enough in their structure.
2013-07-18 would not be a "date amended" for 17 USC 103. That's the date of the earliest USLM version of the USC, which contains 17 USC 103, but it doesn't mean the provision didn't exist in earlier versions of the USC that weren't published in USLM. I should change the interface to clarify that. If Legislice provides some CSL-M JSON citations with neither an "enacted" or "amended" date, at least in the first iteration, is that still useful at all?
What should be returned when a user asks for the CSL-M JSON citation for a subsection, paragraph, or other structure beneath the level of a section? Should Legislice just return the CSL-M citation for the parent section instead?
Can you point me to any documentation for the "multi", "main", and "_keys" fields? These field names aren't very Googleable and I'm not finding them.
What if I have a human-readable statute citation, and I want to convert it to CSL-M JSON format? Is there a Python or Javascript package for that?
Hi @fbennett, I made a first pass on this feature in the master branch. I added an Enactment.csl_json() method and updated the user guide.
While adding the feature I had a few more questions.
What should the JSON output look like for constitutional provisions and amendments?
I notice that the example JSON you gave me had a "section" key but also the abbreviation "sec." in the value. Does that mean that other fields would be in a format like {"subsection": "subsec. a", "paragraph": "para. 2"}, etc.? It seems like that would be needless duplication of information, especially because the standard citations don't include terms like "sec." and "para.". I found the locator terms section of the CSL-M documentation, but it still seems unclear.
Is this feature starting to duplicate code that already exists in the CSL ecosystem? Is there any Python library I can import to help serialize the CSL-M JSON or create citations from it?
Jurism is a reference manager that supports legal resources, and implements the "Bluebook" rules for human-readable citations. If LegiSlice were to offer citation data in a format digestible by Jurism (CSL-M JSON, or possibly MODS), it would open interesting paths for integration in user-level services that leverage both tools. If that sounds interesting, I can help with the data structure.