Closed moewew closed 6 years ago
Looks like a bug in the sorting key extractor code - seems to be sorting year as a string. Looking into it.
I see you have added a commit to fix this. Unfortunately, the change leads to unexpected result in some edge cases. sortyear
should be given precedence over year
, it should not live in a \sort
section of its own.
\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[style=authoryear, backend=biber]{biblatex}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@book{appleby,
author = {Humphrey Appleby},
title = {A Title},
sortyear = {1980},
date = {1990},
}
@book{appleby:b,
author = {Humphrey Appleby},
title = {B Title},
sortyear = {1980},
date = {1989},
}
\end{filecontents}
\addbibresource{\jobname.bib}
\iffalse
\DeclareSortingTemplate{nyt}{
\sort{
\field{presort}
}
\sort[final]{
\field{sortkey}
}
\sort{
\field{sortname}
\field{author}
\field{editor}
\field{translator}
\field{sorttitle}
\field{title}
}
\sort{
\field{sortyear}
}
\sort{
\field{year}
}
\sort{
\field{sorttitle}
\field{title}
}
\sort{
\field{volume}
\literal{0}
}
}
\fi
\begin{document}
\cite{appleby:b,appleby}
\printbibliography
\end{document}
As fas as I can see the problem is that sortyear
is a literal and year
an integer and so we can't meaningfully compare the two if they are around... Is that correct?
It's not quite that. It's that the sorting data schema which is needed to generate the internal data structures needed to construct sortkey extraction and generation structures is currently not generated per-key. It is very complicated to fix this in biber as the particular field selected for a sort would need to be tracked per-entry to generate the the correct sorting data schema. Thinking about it.
It's surprising I didn't notice this before. However, it's really difficult to solve this. Sorting needs to know the datatypes of what it is comparing and this assumes that everything in a \sort
is the same datatype (well, it really assumes that everything is either an integer or isn't). sortyear
isn't necessarily an integer, by design. The construction of the sorting dataschema uses this assumption as a shortcut by detecting the data type of a \sort
set by just looking at the first element and since sortyear
is a literal, it sorts year
as a literal too. This is more than a biber issue, it's a sorting algorithm problem. If sortyear
were an integer datatype in the datamodel, it would be fine - what do you think about that solution?
People (and in fact we - as in biblatex-examples.bib
) use things like sortyear = {1984-0},
all the time. That would have to continue to work, so it seems tricky to make sortyear
an integer
...
True. We have to change something though as year
is more important than sortyear
. This is rather intractable - we have to compare either numerically or alphabetically for each \sort
and currently neither option works and never really can unless we can guarantee the data type by the data model or the sorting spec ...
We could just pad the year with zeros automatically and hope for the best. This should work for positive years, not sure about negative years...
Really don't want to do that - that's what we used to do and I switched to better sort algorithm because padding and string sort is awful with the expanded ISO date stuff we now support. It seems to me that the whole existence of sortyear
is strange as a literal anyway. Probably should be an integer. I suspect that the majority of cases for this can be solved by simply putting another \sort
in after the sortyear/year
to discriminate further. This would be a somewhat breaking change but I think in the long term, it's better.
Conceptually that would be better, I agree. But I fear it would be too big a change to render sortyear
unusable. With integers you simply can't get fine sorting like sortyear = {1984-1},
vs sortyear = {1984-2},
True but that's really an abuse of the field anyway. It's essentially a way of making the correct semantic solution of having a following \sort
macro into a hacky syntactic solution. If there is an ordering within a year (which is exactly what this syntax is designed to do), then there should be a further month
or season
or something like that.
Yeah, theoretically I agree. But practically it can happen that one needs to control the year sorting and does not have other semantic options available. Think of two @inbook
s of the same author in the same book, where you want to sort the first before the second chapter, but sorting by title would give the opposite result. Sure I could add pages
to the sorting scheme, but that would be ludicrous.
I'm not sure adding pages would be ludicrous in those circumstances if semantically you want the paper earlier in the collection listed first as it's the pages that determine that ...
Mhh, I really hoped I could win you over with pages
(that's why I did not go for a volume
example as in knuth:ct:a
etc.) ;-).
Again, in principle I agree. But sortyear
is a really well established hack (even biblatex-examples.bib
has 10 instances of it) and I'm really wary of getting rid of it.
There is no way to make this work perfectly with hacked sortyears and the current situation is the worst I think. In general, sortX
fields are the same datatype as the X
field - sortyear
is the exception and I think we need to fix that. I propose:
sortyear
to be int or str but with a note that if it doesn't parse as an int, sorting will not be guranteedsortyear
to an int in some reasonable wayI can't think of any way that coerces sortyear
to int while keeping the sorting of common sortyear
idioms as 1986-00
, 1986-01
as expected.
There isn't really a generalisable way but this current sortyear
hacking is horrible and exactly the sort of thing that biblatex was designed to avoid. Since it is used mostly to sort collections before collection items etc. isn't volume
really for this?
For example, take the Nietzsche texts in the examples.bib
. If you remove the sortyear
from them, you get the same results because we already sort by volume
after the year
. So it's not clear that the hack is even needed in there?
For biblatex-examples.bib
's examples volume sorting should indeed do the right thing. I assume in general people should be able to define a proper sort algorithm and with it should be able to write down a \DeclareSortingTemplate
to sort their bibliography as expected without resorting to sortyear
.
I still believe that sortyear hacking is a viable way to deal with some situations. The question really is how many users would be affected and how many things we are going to break badly with this. I have no idea how many people use sortyear
. I'd have thought its use is not entirely unusual, but I may well suffer from sample bias.
I honestly can't imagine that much would break as people using sortyear
would naturally want it to compare stringwise with year
. I'd rather break it and advise people on a per-case basis to use a proper sorting template. The current situation is much worse to my mind - year
sorting is completely broken, it's just an accident of string sorting that it works for current millenium years.
Maybe we should at least start a short survey on comp.text.tex
to inquire how widely used sortyear
is.
Ok - do you want to do that? I can prepare the changes in one commit in DEV so it can be tested and reverted.
OK, will do.
edit posted to c.t.t: https://groups.google.com/d/msg/comp.text.tex/CVSosV6gEiw/_C3sjunmAgAJ
Can't sortyear be a float? And 1984-01 interpretated as 1984.01?
(I personally never used sortyear, so simply changing it to int
is fine for me too).
Yes, there are some hacks like this that could be done but it won't help much as that's only one example of the possible formats. Also, if it was a float, year would need to be a float too and that slows down comparisons etc.
But it would give people a workaround to salvage their hacks. If floats are too slow, we could go with a fixed number of decimal places...
For the benefit of future me: The examples in biblatex-examples.bib
still sort as expected without sortyear
because of the sortitle
field. But at least the knuth:ct:...
examples would still work as expected with nyvt
and without sorttitle
. In particular some things can be fixed with sorttitle
if sortyear
is not available any more.
An issue raised by the Knuth works is that volume
is a string but its default datatype is an int. I think I may parse int fields as we do ranges to convert them to numbers for sorting.
According to the docs that already happens...
The volume of a multi-volume book or a periodical. It is expected to be an integer, not necessarily in arabic numerals since biber will automatically from roman numerals or arabic letter to integers internally for sorting purposes.
Ah, yes, I see I already did this ...
By coincidence I just got a question about this with a real example. The user wanted to sort manually a number of reports and had used year={2011a}
and year{2011b}
which didn't work. Remembering this discussion I did not suggest sortyear={2011-a}
or something like this, but considered a bit and now think that the suggestion of @moewew to use sorttitle
is actually one of the logical solutions (inserting an extra field in the sort order would be another).
In the following MWE it seems that years are padded from the right for sorting and not from the left
results in
I know that I could use
but somehow it feels weird that I would have to enable proper integer sorting for the year...