plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
520 stars 118 forks source link

[Request] Additional data model support for court/reporter information in Canadian legal citations #1327

Closed alercah closed 2 months ago

alercah commented 11 months ago

I am trying to write a style supporting Canadian legal citations, which have a variety of arcane formatting requirements requiring entry-specific metadata. For instance, court decisions were historically published in court reporters such as Supreme Court Reporter (SCR) or New Brunswick Reporter (NBR). The citation style requires that the citation include the court and jurisdiction identifiers if and only if they can't be inferred from the reporter. Thus, for a decision reported in NBR, it must be annotated with "(\<courtname>)", but, because SCR reports only decisions from the Supreme Court, a decision cited in SCR doesn't need that additional annotation.

In order to handle these nuances correctly, I want to store metadata about the reporter in the entry for a reported decision. This suggests it should be in either field annotations (e.g. on journaltitle) or entry options. Additionally, since there are many decisions reported through the same reporters and I do not want to have to repeat the configuration for each entry, I am defining xdata entries for each reporter and inherting from those in individual decision entries.

Unfortunately, however, as far as I can tell, there is no way to get entry-specific formatting metadata to come through XREF. Here is an example demonstrating this with the options field.

\documentclass{article}
\usepackage{biblatex}
\usepackage{filecontents}
\usepackage{expl3}

\newtoggle{bigpublisher}
\DeclareEntryOption[boolean]{bigpublisher}[false]{\settoggle{bigpublisher}{#1}}

\begin{filecontents}{test.bib}
@xdata{macmillan,
  publisher={MacMillan},
  options={bigpublisher},
}
@book{book,
  author={Very Long Author Name},
  shortauthor={VLAN},
  title={Very Long Title},
  shorttitle={VLT},
  xdata={macmillan},
  % ALTERNATIVE: options={xdata=macmillan-options},
  % DON'T WANT TO DO THIS: options={bigpublisher},
}
\end{filecontents}
\addbibresource{test.bib}

\DeclareListFormat{publisher}{%
  \usebibmacro{list:delim}{#1}%
  \iftoggle{bigpublisher}
    {\textbf{#1}}
    {#1}\isdot
  \usebibmacro{list:andothers}}

\begin{document}
  Try reading this book: \cite{book}
  \printbibliography
\end{document}

If you run this code as-is, you will see that the options field is not inherited through the xdata reference, and therefore the bigpublisher option is not set: the publisher is not bolded like it should be. (And you can verify in the bbl file that it is not set.) Uncommenting out the ALTERNATIVE line, which is a plausible alternative although involves repeating oneself, still doesn't work. This is surprising: even if the behaviour of the options field is to be skipped by default, that should be overridable when a request is specifically made.

If you uncomment the DON'T WANT TO DO THIS line, you can verify that the option code behaves correctly when it's explicitly specified.

I also tried some other approaches:

This is surprising behaviour. Given that the sole purpose of xdata entries is to be inherited, it seems weird to me that entry options are not inheritable, especially when you explicitly request to inherit them. There could be an issue with overwriting options explicitly specified on the entry or inherited from another xdata, but that same problem exists with other list fields such as keywords, which xdata is happy to copy over.

The lack of support for field annotations in xdata is even more surprising, as the annotations are specifically intended to go alongside the field they are annotating. Therefore I would expect that the annotations should be inherited whenever the field is.

plk commented 11 months ago

The problem with inheriting option and annotations with xdata is that you can have cascading and multiple inheritance - should such fields be merged, overwritten etc. when they conflict? This will get very messy if all of the possible choices have to be parameterised. When it comes to specialised use cases like this, you really should consider writing a dedicated style to handle the situation as was done, for example, with the APA legal citation style in the APA 7th edition style.

alercah commented 11 months ago

For annotations I would expect that each separately named annotation is handled separately as if it were an individual field. While this wouldn't handle all use cases, since arbitrary data can be put in the annotations, this would work for any case where there's a single field being used.

For options, I think it is less clear what the correct behaviour should be without some additional syntax for specifying the mode of conflict resolution, but again I note that this is already the behaviour with keywords so the disparity between the two is inconsistent.

Thank you for the pointer to the APA style and its approach. I am indeed writing a dedicated style to handle this, but I'm trying to encode some of the (substantial) variation in how to correctly cite particular case reporters directly in the data model.

Now that I have things working properly, I can give a proper example using my own code. The McGill style guide prescribes that the year of decision, court, and jurisdiction should be included if they are not already present or inferred from the citation. Thus, for instance, in the case of Beach v. The King, it could be cited as either of:

In the first example, we have to include the year because it is not already present in the citation "37 SCR 259". But since the reporter is the Supreme Court Reporter, we know it is the Supreme Court so we don't need to explicitly include that. In the second example, the CanLII citation includes the year so we don't repeat ourselves by including it again, but we do need to include the court since that is not implied by the citation. In both cases, we omit the jurisdiction ("Can") because it's implied by the Supreme Court.

My entries for these citations look like this:

@jurisdiction{beach:official,
  title={Beach v. the King},
  shorthand={Beach},
  origdate={1906-02-21},
  xdata={j:SCR},
  volume=37,
  pages=259,
  parallel={beach:canlii},
}

@jurisdiction{beach:canlii,
  crossref={beach:official},
  xdata={j:CanLII, c:SCC},
  eid=77,
}

Here, parallel is a new field I've defined, for tracking parallel citations of the same case in situations where multiple citations are needed; you can ignore it but I'm including it for completeness. (I will probably filing more feature requests soon about how to handle parallel citations but they are out of scope for this thread.

You'll also note I've used crossref to inherit some of the data from multiple copies of the same case; I've defined inheritance rules so that citation-specific fields like volume, pages, and eid are not inherited.

The XDATAs are to information about the "journals" and institutions involved:

@xdata{i:SCC,
  institution={Supreme Court of Canada},
  shortinstitution={SCC},
  jurisdiction={Canada},
  shortjurisdiction={Can},
}

@xdata{j:SCR,
  journaltitle={Supreme Court Reports},
  entrysubtype={official},
  shortjournal={SCR},
  xdata={i:SCC},
  keywords={hasinstitution, annual},
}

@xdata{j:CanLII,
  journaltitle={CanLII},
  shortjournal={CanLII},
  keywords={citeyear},
}

The keywords are intended to provide information about how to format the citation: citeyear says that the year of decision is to be included in the citation, and annual says that the journal had annually numbered volumes, restarting from 1 every year. And hasinstitution says that the institution is already present and therefore doesn't need to be repeated in the citation.

I would much prefer to use annotations for this; I think they are a better fit for the data model and the least prone to problems. keywords very much feels like an abuse of the data model.

But ultimately my goal is simply that I want to be able to keep the institution- and journal-specific formatting information from having to be repeated every single time. An approach like APA's would require that kind of repetition. I'm very open to any kind of solution that would let me avoid repeating myself. If you want to discuss over a messaging service perhaps that would be better to hash things out?

alercah commented 11 months ago

One way to handle collisions could be to allow for manual resolution of XDATA and similar in the driver's source mappings. Something like:

  1. Add an iterate option for fieldsource causing the field to be parsed as a list and the remainder of the map to be processed once for each value of the field. (Careful: the parsing should be locked in before any modifications are made, to avoid infinite loops)
  2. Allow fieldsource to accept an entitytarget to access data in an XDATA entry. (other relations would be cool too, but see the next point)
  3. Ensure that entries are processed in topologically sorted order so that the result is consistent and there's no need for every sourcemap rule that handles XDATA to handle nested XDATA.

I believe this should still result in sourcemaps doing bounded work and therefore not Turing complete.

alercah commented 11 months ago

Another alternative I can think of would be to list the courts and reporters as distinct entries with their own entrytypes. This is how I would format this in a general database, but it would result in very style-specific data (even moreso than the XDATA approach) and would require some additional support to ensure that all of the court/journal entries could be loaded into the .bbl without being specifically referenced (similar to entryset members).

moewew commented 11 months ago

I think the issue originally pointed out in the first post with options is pretty tricky since it would in essence required merging the fields (which does not make sense for most fields, but kind of does for options). Since the order of options can be important, we'd have to have a clear and documented idea of how that happens. I'm not sure how useful this is.


But I think that the point about field annotations is relevant. They should be inherited.

I'd expect a bold publisher in both cases here.

\documentclass{article}
\usepackage{biblatex}

\begin{filecontents}{\jobname.bib}
@xdata{macmillan,
  publisher    = {MacMillan},
  publisher+an = {1=bigpublisher},
}
@book{bookA,
  author      = {Very Long Author Name},
  title       = {Very Long Title with XDATA},
  xdata       = {macmillan},
}
@book{bookB,
  author       = {Very Long Author Name},
  title        = {Very Long Title with manual data},
  publisher    = {MacMillan},
  publisher+an = {1=bigpublisher},
}
\end{filecontents}
\addbibresource{\jobname.bib}

\DeclareListFormat{publisher}{%
  \usebibmacro{list:delim}{#1}%
  \ifitemannotation{bigpublisher}
    {\textbf{#1}}
    {#1}\isdot
  \usebibmacro{list:andothers}}

\begin{document}
  Try reading this book: \cite{bookA,bookB}
  \printbibliography
\end{document}
plk commented 11 months ago

Annotations should now be inherited via XDATA in biber 2.20 DEV version. It also takes care of remapping xdata granular indices across entries. You can test with biber 2.20 DEV version from SourceForge.

alercah commented 10 months ago

Just to confirm receipt of this; I'll give it a try later this week hopefully.

spectria-limina commented 9 months ago

I tried https://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/development/biblatex-biber.tar.gz/download and it does not work on my code or on this example

plk commented 9 months ago

Did you update to biber 2.20 DEV version? This is required also.

spectria-limina commented 9 months ago

I used the biber binary from that package: $ biblatex-biber-2.20/bin/biber test

I didn't use an updated version of biblatex.

plk commented 9 months ago

Hmm, the above example works for me - can you post the .bbl file generated by biber for the above example?

spectria-limina commented 9 months ago

I tried setting PERL5LIB to pick up Biber.pm from the downloaded package. Then it complained about using the wrong version of biblatex, so I download biblatex 3.20 as well and used that.

But it still isn't working, see attached [Uploading test.bbl.txt…]() (as a .txt file to satisfy the attachment extension police).

spectria-limina commented 9 months ago

Oops, the upload didn't go through. Trying again... test.bbl.txt

plk commented 9 months ago

The .bbl looks like you aren't using biber 2.20. You can check with biber -v. You shouldn't have to set any perl env vars and doing so will likely cause issue if you aren't using the raw perl biber source tree.

spectria-limina commented 9 months ago

It says biber version: 2.20 (beta). I'm going to guess it's more problems with the system Perl modules and the local perl modules... If I unset PERL5LIB then it says it is Biber 2.19.

plk commented 9 months ago

If you are using the binary version of biber from SF, you don't need any perl installed at all. You only need perl if you are checking out the github source code and running it from source.

spectria-limina commented 9 months ago

I used the source package because I did not want to try to figure out the correct version of libraries for the binary package. I will see if I can get the binary package to work.

alercah commented 9 months ago

Sorry for the delay responding. "later this week" turned out not to be...

But it looks good on my end, does exactly what I'm looking for, thanks! Please let me know when you release 2.20!

alercah commented 2 months ago

2.20 was released, closing.