plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
514 stars 118 forks source link

Problem with \rangelen #257

Closed gvdgdo closed 9 years ago

gvdgdo commented 10 years ago

It seems that \rangelen does not work as advertised in the example on page 194 of the manual.

It seems it works if the input is an explicit range (e.g., "10-15") but not if passed implicitly (e.g., \thefield{pages})

Here is an MWE:

    \documentclass{article}
    \usepackage{filecontents}
    \usepackage{biblatex}
    \begin{filecontents}{\jobname.bib}
    @book{a,
      title={Title},
      author={Author, A},
      year={2014},
      pages={10-15}
    }
    @book{b,
      title={Title},
      author={Buthor, A},
      year={2014},
      pages={10-}
    }
    @book{c,
      title={Title},
      author={Cuthor, A},
      year={2014},
      pages={10}
    }
    \end{filecontents}
    \addbibresource{\jobname.bib}

    \DeclareBibliographyDriver{book}{
        (10-15)\rangelen{10-15}\\
        (10-)\rangelen{10-}\\ 
        (-10)\rangelen{-10}\\
        (10)\rangelen{10}\\
        \printfield{pages}: Number of pages reported by rangelen \rangelen{pages}\par
        Testing the example in the manual:
     \ifnumcomp{\rangelen{\thefield{pages}}}
       {=}
       {1}
       {add f}
       {do nothing}.
    }

    \begin{document}
    \nocite{*}
    \printbibliography
    \end{document}

screen shot 2014-07-19 at 8 22 20 am

plk commented 10 years ago

@josephwright - perhaps you could look at \rangelen? I can't work out why the argument parsing is ok with \rangelen{10\bibrangedash 15} but breaks with \rangelen{\thefield{pages}} when the pages field contains 10\bibrangedash 15.

josephwright commented 10 years ago

I can easily set up to get \thefield{pages} to expand, but there is an issue I need to follow further. As noted in the code, \rangelen has to be expandable, but the current definition uses \blx@imc@ifinteger which is not. I'll check the logs to see if this has always been true or if not when/why it was introduced.

josephwright commented 10 years ago

To be fully expandable, everything used by \rangelen must be expandable. That means using an expandable integer test and avoiding any other non-expandable code. At the same time, you need to ensure that the argument to \rangelen is fully expanded. That leads to an implementation something like:

\newcommand*{\rangelen}[1]{%
  \expandafter\blx@rangelen@range\romannumeral-`\q%
    #1\bibrangedash\bibrangedash&}

\def\blx@rangelen@range#1\bibrangedash#2\bibrangedash#3&{%
  \ifblank{#3}
    {\blx@rangelen@hyphen#1--&}
    {\blx@rangelen@check{#1}{#2}}%
}
\def\blx@rangelen@hyphen#1-#2-#3&{%
  \ifblank{#3}
    {1}% No range at all: assume one page
    {\blx@rangelen@check{#1}{#2}}%
}
\def\blx@rangelen@check#1#2{%
  \expandafter\blx@rangelen@check@aux
    \number\numexpr
      \blx@rangelen@check@int{#2}
      -
      \blx@rangelen@check@int{#1}
    \relax
    &\stop
}
\def\blx@rangelen@check@aux#1&#2\stop{%
  \ifblank{#2}
    {#1}
    {0}%
}
\def\blx@rangelen@check@int#1{%
  \ifblank{#1}
    {0&}
    {%
      \if\number\numexpr0#1-0#1\relax0
        #1
      \else
        0&
      \fi
    }%
}

The docs don't seem to define what \rangelen{a} or \rangelen{a-b} should give: I've gone with the easiest approach that anything without a range is one page, anything else must have two integer page numbers.

(I've left the basic integer test alone: without some proper unit tests I'm not sure it the behaviour differs. Perhaps one I'll look at in an expl3 context where I can easily do proper unit testing.)

josephwright commented 10 years ago

@plk The reason that with the current definition \rangelen{10\bibrangedash 15} works but \rangelen{\thefield{pages}} fails is that TeX is not a functional language. Thus \rangelen sees exactly the input as written here: there is no \bibrangedash in \thefield{pages}. Thus the solution is to expand the input (f-type expansion in expl3 parlance), which is what I've done in the above. The fact that the internals are then also defective in an expansion context is a separate issue!

josephwright commented 10 years ago

@plk I've taken advantage of the fact that \numexpr keeps going until it finds something that is not an expandable macro/primtiive, number or relation (+, -, _etc.): that lets me terminate the parsing cleanly and then use a bit of 'clean-up' code to find the appropriate result.

plk commented 10 years ago

I realise about the functional thing but I was confused because when tracing, those two examples ended up being the same thing going in to the second level of macros. That is, \the{pages} was expanded but the result wasn't matched by the macro argument pattern and everything was going into #1.

plk commented 10 years ago

Do you think the code you posted above is useable? Would be nice to fix this problem definitively ...

gvdgdo commented 10 years ago

The computation in the given code is off by one (e.g., \rangelen{10-15} gives 5 but it should give 6). Maybe this is a mater of how it is going to be used, if it is used with pages to count the number of pages, it is definitely off by 1.

josephwright commented 10 years ago

@gvdgdo This can easily be altered: the only question is what is the expected behaviour.

moewew commented 10 years ago

Sorry to intrude on this issue.

I noticed that with the new definitions \rangelen{\bibrangedash10} and \rangelen{-10} differs: the latter gives 0 (as one would expect after reading the manual), but the former returns 1.

Maybe it would not be a bad idea to be able to distinguish an open-ended range (10-) from a range without a start element (-10), because the former might give rise to adding sequentes marker after the starting page number.

Anyway, the problem with how pages are counted right now is that \rangelen{10} and \rangelen{10-11} both yield 1, somewhat defeating the purpose of the test given in the manual as example. Seeing that this test could in theory also be used to decide whether to use plural or singular p. or pp. for the page ranges, it seems natural to output 1 for a lone, single page and more for real ranges. So \rangelen{10-15} should give 6, just as \rangelen{10-11} should give 2.

plk commented 10 years ago

@josephwright - do you think we can now fix this? It looks like we just have to agree on what the correct counts are for various cases?

josephwright commented 10 years ago

@plk I think this is a case where unit testing would really help :-) In the absence of that, we need at least a tight spec on what the result of different input should be. As @moewew points out, it's odd that both \rangelen{10} and \rangelen{10-11} give 1. I guess I'd favour logic:

I'm not sure about trying to distinguish the two open ended cases as different lengths, although I could I guess do 0 and -1 or something like that.

plk commented 10 years ago

I think 0 and -1 are fine as markers for the two open range types. We need unit testing for biblatex in general. What would be ideal would be a non-pdf output like utf8 text only so we can "diff" for tests without the pain at the moment of kerning, spacing changing regularly with the engine so that accurate PDF comparison for output is really hard to maintain.

josephwright commented 10 years ago

Unit testing: http://www.texdev.net/2014/05/27/testing-tex-lua-and-tex-and-not-just-for-luatex/ and upcoming TUGBoat by Frank (source at https://github.com/latex3/svn-mirror/blob/master/articles/lua-test-suite.tex). I'll look to write myself a proper set of inputs/outputs and alter \rangelen over the coming days.

plk commented 10 years ago

Any update Mr W? I'd like to close this if I can and get it into the dev branch.

josephwright commented 9 years ago

Updated version with output

\newcommand*\rangelen[1]{%
  \ifblank{#1}
    {0}%
    {%
      \expandafter\blx@rangelen@range\romannumeral-`\q%
      #1\bibrangedash\bibrangedash&%
    }%
  }

\def\blx@rangelen@range#1\bibrangedash#2\bibrangedash#3&{%
  \ifblank{#3}
    {\blx@rangelen@hyphen#1--&}
    {\blx@rangelen@check{#1}{#2}}%
}
\def\blx@rangelen@hyphen#1-#2-#3&{%
  \ifblank{#3}
    {1}% No range at all: assume one page
    {\blx@rangelen@check{#1}{#2}}%
}
\def\blx@rangelen@check#1#2{%
  \expandafter\blx@rangelen@check@aux
    \number\numexpr
      \blx@rangelen@check@int{#2}
      -
      \blx@rangelen@check@int{#1}
      + 1
    \relax
    &\stop
}
\def\blx@rangelen@check@aux#1&#2\stop{%
  \ifblank{#2}
    {#1}
    {-1}%
}
\def\blx@rangelen@check@int#1{%
  \ifblank{#1}
    {0&}
    {%
      \if\number\numexpr0#1-0#1\relax0
        #1
      \else
        0&
      \fi
    }%
}
josephwright commented 9 years ago

The only thing this leaves awkward is the input \rangelen{-}, which gives -1 but I'm not really sure what to do with!

josephwright commented 9 years ago

Note that non-numerical pages also give the -1 value if there is an apparent range, so \rangelen{i-ii} gives -1. Again, I'm not sure what is wanted here (converting different representations of page numbers is doable but the auto-detection will not be much fun!).

josephwright commented 9 years ago

We could make either 0 or -1 a more general value 'A page range cannot be determined: this includes the case of open ranges, non-numeric page numbers and so on.'

It's mainly a case of deciding what is wanted.

plk commented 9 years ago

@josephwright - You know I just realised that this is probably much easier for biber to do. Perhaps if biber returns a value for all page ranges giving the length. Also non-numerical stuff is much easier using Unicode equivalance classes there. Something in the bbl like:

\field{pages}{10\bibrangedash 15}
\range{pages}{6}

for all fields of datatype "range"?

In fact it's harder than we thought because a range field can be multiple ranges: 10-12, 20-30 etc. and the length should probably the sum of all of them.

plk commented 9 years ago

@josephwright - I did a quick test with perl and it's quite nice - I can generate rangelen for roman numerals too, irrespective of whether they are using the special U+216x and U+217x ranges or ASCII representations etc. This will save a lot of trouble in biblatex.

josephwright commented 9 years ago

Well you make the calls, but a TeX-based solution works for 'everyone' not just for Biber people :-) I'd also be very cautious about taking on anything non-numeric, otherwise you get into the tricky cases such as 'i-x,1-10' or even worse 'i-10'!

plk commented 9 years ago

Actually, they are already in my test cases and work fine ... any combination of roman and decimals will work, even in really strange Unicode cases. I would not remove the current implementation for bibtex users ...

plk commented 9 years ago

@gvdgdo - please try bibaltex 3.0 dev and biber 2.0 dev from Sourceforge. Joseph Wright has created a better implementation of \rangelen and there is a new macro \frangelen which, when used with biber, takes the name of a range field like 'pages' and returns the length of the range in the field. This macro can handle multiple ranges in the same field, roman numerals, Unicode roman numerals, implicit ranges etc. and is generally more robust than \rangelen. See the PDF manual.

u-fischer commented 9 years ago

@JosephWright There is a problem with the new definition of \rangelen in biblatex. Calls with open ranges e.g. \rangelen{1-} fail with Missing $ inserted.. As far as I can see the reason is that & has catcode 3 (math shift) at the moment of the definition of \rangelen + internal commands in biblatex2.sty (\catcode``\&=3 is set in line 50).

\documentclass{article}

\usepackage[]{biblatex}

\begin{document}
\rangelen{-1} 
\end{document}
josephwright commented 9 years ago

The issue is with \ifblank: fix in hand.

jspitz commented 5 years ago

Joseph Wright has created a better implementation of \rangelen and there is a new macro \frangelen

Is the documentation of \rangelen correct? Contrary to what is written here, it says it takes a field as parameter. Also, \frangelen does not seem to be documented at all.