plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
508 stars 116 forks source link

Update Entry pages Based on Citation Command Postnote Page Range #1138

Open dollodart opened 3 years ago

dollodart commented 3 years ago

Often the same entry is cited repeatedly and different pages of that entry are cited each time. The backref option gives backlinks to each citation of an entry including the page number, or with some option set for backrefstyle to the start of a page range if that entry is cited more than once on two or more contiguous pages. The ibidpage and citepages option set whether a citation is unique depending on the pages in the citation command postnote and whether a citation will print the pages entry key if a pagerange is in the citation command postnote.

This feature request is to update the pages key of the entry for every page range post-note it finds. The pages key may be used for an overall range or a starting page for a reference in, e.g., a multivolume work. In that case, a special pinpages key could be updated. This feature can be built using slightly derivative macros based on the existing backref and pageref macros, and it is in the same theme as the existing features given in the above paragraph. Here is a MWE:

\begin{filecontents}{mwe.bib}
@article{mwe,
    title={minimum working example},
    journal={github},
    pages={1},
    author={dollodart},
    year={2020}}
\end{filecontents}

\documentclass{article}
\usepackage{biblatex}
\addbibresource{mwe.bib}

\begin{document}

In \cite[1--3]{mwe}, there is lorem ipsum \cite[4--5]{mwe}.

\printbibliography
\end{document}

If this feature were enabled, the pages typeset in the entry would change to (LaTeX formatted) 1--3, 4--5, or perhaps to 1--5 based on some package option like backrefstyle. That is,

\printbibliography output: [1] dollodart. “minimum working example”. In: github (2020), p. 1.
desired \printbibliography output: [1] dollodart. “minimum working example”. In: github (2020), pp. 1--3, 4--5.

While backref already provides hyperlinking, someone may wish to omit backref and hyperlink only based on cited page ranges, so that feature might be good to implement, too.

An additional development requirement comes from the currently limited parsing of the postnote argument to a page range which only handles numeric (though also list delimited and with arbitrary spacing) input. Because the current postnote is only used for typesetting page ranges, the authors of the package have provided several macros are provided like \pno and \ppno to typeset the locale string for page or pages abbreviation in case the page range can't be parsed. In order for this feature to be useful for the common case of non-numeric characters in page ranges, there must at least be a way of specifying in the postnote argument what the page or page range is, like \cite[*\pagerange{10}{20}]{...}. Many citations use alphabetical in addition to numerical indexes and have arbitrary delimiters, so better parsing may be required, like for \cite[1a2]{...} and \cite[1(a)(2)]{...}.

dollodart commented 3 years ago

I have the the following suggestions based on some code review, which amounts to the before suggested analogy to backref/pageref, and making an analogous range list to the existing name list:

  1. In analogy to backref, a \abx@aux@citeref can be saved to and read from the .aux file which similarly has an instance counter, entry key, and reference section, though it doesn't need page or page reference. It would also be possible to simply change \abx@aux@cite to include the postnote as a second argument, and keep the segment information which appears in the separate command \abx@aux@segm. This aux file writing would take place in the <postcode>, or the fourth argument in \DeclareCiteCommand (if the source code is changed it happens at a level below the API, but how this .aux writing is implemented in biblatex.sty for all citation commands is somewhat complicated with macros \blx@citeadd, \blx@citation, and \blx@citation@entry and it can be understood as a hook into the latter part of the cite command).

  2. A citedpages list field can be created and updated in analogy to pageref, using an analogous macro to \blx@addpageref (this just defines the field pageref based on what has been accumulated in \blx@pref@\the\c@refsection@<entry key> through \blx@aux@backref, so analogous macros would be made for these, too). Page ranges may have to be literal since they can hold non-numeric identifiers like (a) or (b). The only list formats are name and literal, but a range list may be good to implement if the ranges would be parsed at the time they are added to the list (see optional below). This would allow arbitrary range list formatting through some constants analogous to nameparts. That is, something like \DeclareDatamodelConstant[type=list]{rangeparts}{lower, upper, sep} with the \rangepartlower, \rangepartupper, and \rangepartsep.

  3. The pageref:init, pageref:comp, and pageref:dump bibmacros can be modified to format citedpages (c.f. the \DeclareListFormat{pageref} in biblatex.def). Unlike backref page numbers, the citedpages ranges do not generally come in ascending order. But the bibmacros can be used with the same program flow, just different tests based on comparing range edges rather than integers, if one sorts the list before formatting it with \DeclareListFormat. Those tests are easier to do if the ranges are parsed like suggested in (2). Since field formatting takes place at the time the bibliography entry is formatted, sorting need only be done before a bibliography is made.

Optionally, improve the page range parsing macros used for \mkpageprefix in biblatex.sty to take explicitly given pageranges using some macros. The argument could be a list of \bibrangessep separated ranges, and contain more than one type of macro (\page, \pages or \pagerange), or no macro at all if the parsing works on mixed numbers and non-numbers and \bibrangedash. The program flow would have to change so \mkpageprefix, or some blx namespaced internal command, would update the entry of the currently processed citation. Because the bibliography data is being updated with data which is in-line to the LaTeX file, alternatively save the citedpages field before any parsing to the .bcf file and the Biber backend could do the parsing, though that limits the feature to the Biber backend. This basically extends the functionality for names to ranges, but unlike for names, BibTeX doesn't have any parsing rules for ranges, so any parsing not done in the LaTeX layer would necessarily be limited to Biber. Note the datamodel constant would also have to have something like \rangepartprefix and \rangepartpostfix for all things not part of the numeric ranges specified in the cite command postnote.

Any thoughts on these suggestions?

dollodart commented 3 years ago

In an attempt to gain interest for this issue, here is another use case: web sources and pdf documents support query parameters, for example, a pdf document link can be appended with "#page=5" to go to page 5 or with "#nameddest=named_destination" to go to a named destination (in certain pdf viewers). Quite often a person will want to have in their bibliography a web page or pdf document but cite different parts of it throughout their document, and link to those different parts in each citation. If the argument to the citation command is parsed and put into a field which can be accessed by the loop code, then links could be automatically made for the citation as part of its formatting. It is not valuable in this case to accumulate the query parameters necessarily (when they are fragment identifiers for HTML or named destinations for pdf), but when citing pages in an external pdf document which happens to also have a URL, there is the same use case as when there isn't a URL.

dollodart commented 3 years ago

This is easy to do as long as you don't update the .bcf file for sourcemapping (there are reasons for wanting to do this, though, if you process the .bcf or .bbl files for analytics or want to make use of the backend to process the data before typesetting). Define the following bibmacro in *.cbx which just accumulates the postnote arguments in a list in a newly defined field:

\newbibmacro*{savepostnote}{%
\edef\temp{abx@\thefield{entrykey}@postnotes}%
\ifciteseen{}{\csxdef{\temp}{}}%
\iffieldundef{postnote}{}{\listcsxadd{\temp}{\csfield{postnote}}}%
}

As an aside, one could use the postcode argument to define a URL for the postnote based on the entry URL since it has access to it through \thefield{url} and the postnote through \thefield{postnote}, like the last comment suggested for making hyperlinks based on the post note.

Place this macro in the postcode argument to \DeclareCiteCommand. Then define a bibmacro for using the accumulated list. One possibility is

\newbibmacro*{allpinpages}{%
\edef\temp{abx@\thefield{entrykey}@postnotes}%
\ifcsvoid{\temp}{}{%
\renewcommand\do[1]{\printtext{##1}\setunit{\addcomma\addspace}}% nested in another macro definition 
\printtext{\bibstring{pages}\setunit{\addspace}}%
\dolistcsloop{\temp}%
}
}

Place this somewhere in \DeclareBibliographyDriver.There still remain the questions of parsing the postnote argument (which would be done in the first bibmacro) and sorting the list (which would be done in the second bibmacro). However, parsing and sorting are implemented in the biber backend (with small exception to \blx@ifnum, which doesn't parse but only determine if the argument is a numeral). Implementing these features would be best done by changing how the .bcf file is written to have sourcemapping be done at the end of document rather than at the preamble and passing to the backend, though TeX or LaTeX solutions exist.