[Request] Additional data model support for court/reporter information in Canadian legal citations

alercah commented 11 months ago

I am trying to write a style supporting Canadian legal citations, which have a variety of arcane formatting requirements requiring entry-specific metadata. For instance, court decisions were historically published in court reporters such as Supreme Court Reporter (SCR) or New Brunswick Reporter (NBR). The citation style requires that the citation include the court and jurisdiction identifiers if and only if they can't be inferred from the reporter. Thus, for a decision reported in NBR, it must be annotated with "(\<courtname>)", but, because SCR reports only decisions from the Supreme Court, a decision cited in SCR doesn't need that additional annotation.

In order to handle these nuances correctly, I want to store metadata about the reporter in the entry for a reported decision. This suggests it should be in either field annotations (e.g. on journaltitle) or entry options. Additionally, since there are many decisions reported through the same reporters and I do not want to have to repeat the configuration for each entry, I am defining xdata entries for each reporter and inherting from those in individual decision entries.

Unfortunately, however, as far as I can tell, there is no way to get entry-specific formatting metadata to come through XREF. Here is an example demonstrating this with the options field.

\documentclass{article}
\usepackage{biblatex}
\usepackage{filecontents}
\usepackage{expl3}

\newtoggle{bigpublisher}
\DeclareEntryOption[boolean]{bigpublisher}[false]{\settoggle{bigpublisher}{#1}}

\begin{filecontents}{test.bib}
@xdata{macmillan,
  publisher={MacMillan},
  options={bigpublisher},
}
@book{book,
  author={Very Long Author Name},
  shortauthor={VLAN},
  title={Very Long Title},
  shorttitle={VLT},
  xdata={macmillan},
  % ALTERNATIVE: options={xdata=macmillan-options},
  % DON'T WANT TO DO THIS: options={bigpublisher},
}
\end{filecontents}
\addbibresource{test.bib}

\DeclareListFormat{publisher}{%
  \usebibmacro{list:delim}{#1}%
  \iftoggle{bigpublisher}
    {\textbf{#1}}
    {#1}\isdot
  \usebibmacro{list:andothers}}

\begin{document}
  Try reading this book: \cite{book}
  \printbibliography
\end{document}

If you run this code as-is, you will see that the options field is not inherited through the xdata reference, and therefore the bigpublisher option is not set: the publisher is not bolded like it should be. (And you can verify in the bbl file that it is not set.) Uncommenting out the ALTERNATIVE line, which is a plausible alternative although involves repeating oneself, still doesn't work. This is surprising: even if the behaviour of the options field is to be skipped by default, that should be overridable when a request is specifically made.

If you uncomment the DON'T WANT TO DO THIS line, you can verify that the option code behaves correctly when it's explicitly specified.

I also tried some other approaches:

Using annotations instead of options. This also ran into a problem that it worked fine when the annotations were specified explicitly on the entry, but not on the xdata entry.
Modifying the inheritance rules to avoid suppressing the options. But it appears the inheritance rules apply only to crossrefs and not to xdata inheritance, so this approach didn't work.

This is surprising behaviour. Given that the sole purpose of xdata entries is to be inherited, it seems weird to me that entry options are not inheritable, especially when you explicitly request to inherit them. There could be an issue with overwriting options explicitly specified on the entry or inherited from another xdata, but that same problem exists with other list fields such as keywords, which xdata is happy to copy over.

The lack of support for field annotations in xdata is even more surprising, as the annotations are specifically intended to go alongside the field they are annotating. Therefore I would expect that the annotations should be inherited whenever the field is.

plk commented 11 months ago

The problem with inheriting option and annotations with xdata is that you can have cascading and multiple inheritance - should such fields be merged, overwritten etc. when they conflict? This will get very messy if all of the possible choices have to be parameterised. When it comes to specialised use cases like this, you really should consider writing a dedicated style to handle the situation as was done, for example, with the APA legal citation style in the APA 7th edition style.

alercah commented 11 months ago

For annotations I would expect that each separately named annotation is handled separately as if it were an individual field. While this wouldn't handle all use cases, since arbitrary data can be put in the annotations, this would work for any case where there's a single field being used.

For options, I think it is less clear what the correct behaviour should be without some additional syntax for specifying the mode of conflict resolution, but again I note that this is already the behaviour with keywords so the disparity between the two is inconsistent.

Thank you for the pointer to the APA style and its approach. I am indeed writing a dedicated style to handle this, but I'm trying to encode some of the (substantial) variation in how to correctly cite particular case reporters directly in the data model.

Now that I have things working properly, I can give a proper example using my own code. The McGill style guide prescribes that the year of decision, court, and jurisdiction should be included if they are not already present or inferred from the citation. Thus, for instance, in the case of Beach v. The King, it could be cited as either of:

Beach v. the King (1906), 37 SCR 259
Beach v. the King, 1906 CanLII 77 (SCC)

In the first example, we have to include the year because it is not already present in the citation "37 SCR 259". But since the reporter is the Supreme Court Reporter, we know it is the Supreme Court so we don't need to explicitly include that. In the second example, the CanLII citation includes the year so we don't repeat ourselves by including it again, but we do need to include the court since that is not implied by the citation. In both cases, we omit the jurisdiction ("Can") because it's implied by the Supreme Court.

My entries for these citations look like this:

@jurisdiction{beach:official,
  title={Beach v. the King},
  shorthand={Beach},
  origdate={1906-02-21},
  xdata={j:SCR},
  volume=37,
  pages=259,
  parallel={beach:canlii},
}

@jurisdiction{beach:canlii,
  crossref={beach:official},
  xdata={j:CanLII, c:SCC},
  eid=77,
}

Here, parallel is a new field I've defined, for tracking parallel citations of the same case in situations where multiple citations are needed; you can ignore it but I'm including it for completeness. (I will probably filing more feature requests soon about how to handle parallel citations but they are out of scope for this thread.

You'll also note I've used crossref to inherit some of the data from multiple copies of the same case; I've defined inheritance rules so that citation-specific fields like volume, pages, and eid are not inherited.

The XDATAs are to information about the "journals" and institutions involved:

@xdata{i:SCC,
  institution={Supreme Court of Canada},
  shortinstitution={SCC},
  jurisdiction={Canada},
  shortjurisdiction={Can},
}

@xdata{j:SCR,
  journaltitle={Supreme Court Reports},
  entrysubtype={official},
  shortjournal={SCR},
  xdata={i:SCC},
  keywords={hasinstitution, annual},
}

@xdata{j:CanLII,
  journaltitle={CanLII},
  shortjournal={CanLII},
  keywords={citeyear},
}

The keywords are intended to provide information about how to format the citation: citeyear says that the year of decision is to be included in the citation, and annual says that the journal had annually numbered volumes, restarting from 1 every year. And hasinstitution says that the institution is already present and therefore doesn't need to be repeated in the citation.

I would much prefer to use annotations for this; I think they are a better fit for the data model and the least prone to problems. keywords very much feels like an abuse of the data model.

But ultimately my goal is simply that I want to be able to keep the institution- and journal-specific formatting information from having to be repeated every single time. An approach like APA's would require that kind of repetition. I'm very open to any kind of solution that would let me avoid repeating myself. If you want to discuss over a messaging service perhaps that would be better to hash things out?

alercah commented 11 months ago

One way to handle collisions could be to allow for manual resolution of XDATA and similar in the driver's source mappings. Something like:

Add an iterate option for fieldsource causing the field to be parsed as a list and the remainder of the map to be processed once for each value of the field. (Careful: the parsing should be locked in before any modifications are made, to avoid infinite loops)
Allow fieldsource to accept an entitytarget to access data in an XDATA entry. (other relations would be cool too, but see the next point)
Ensure that entries are processed in topologically sorted order so that the result is consistent and there's no need for every sourcemap rule that handles XDATA to handle nested XDATA.

I believe this should still result in sourcemaps doing bounded work and therefore not Turing complete.

alercah commented 11 months ago

Another alternative I can think of would be to list the courts and reporters as distinct entries with their own entrytypes. This is how I would format this in a general database, but it would result in very style-specific data (even moreso than the XDATA approach) and would require some additional support to ensure that all of the court/journal entries could be loaded into the .bbl without being specifically referenced (similar to entryset members).

moewew commented 11 months ago

I think the issue originally pointed out in the first post with options is pretty tricky since it would in essence required merging the fields (which does not make sense for most fields, but kind of does for options). Since the order of options can be important, we'd have to have a clear and documented idea of how that happens. I'm not sure how useful this is.

But I think that the point about field annotations is relevant. They should be inherited.

I'd expect a bold publisher in both cases here.

\documentclass{article}
\usepackage{biblatex}

\begin{filecontents}{\jobname.bib}
@xdata{macmillan,
  publisher    = {MacMillan},
  publisher+an = {1=bigpublisher},
}
@book{bookA,
  author      = {Very Long Author Name},
  title       = {Very Long Title with XDATA},
  xdata       = {macmillan},
}
@book{bookB,
  author       = {Very Long Author Name},
  title        = {Very Long Title with manual data},
  publisher    = {MacMillan},
  publisher+an = {1=bigpublisher},
}
\end{filecontents}
\addbibresource{\jobname.bib}

\DeclareListFormat{publisher}{%
  \usebibmacro{list:delim}{#1}%
  \ifitemannotation{bigpublisher}
    {\textbf{#1}}
    {#1}\isdot
  \usebibmacro{list:andothers}}

\begin{document}
  Try reading this book: \cite{bookA,bookB}
  \printbibliography
\end{document}

plk commented 11 months ago

Annotations should now be inherited via XDATA in biber 2.20 DEV version. It also takes care of remapping xdata granular indices across entries. You can test with biber 2.20 DEV version from SourceForge.

alercah commented 10 months ago

Just to confirm receipt of this; I'll give it a try later this week hopefully.