plk / biber

Backend processor for BibLaTeX
Artistic License 2.0
336 stars 38 forks source link

biber still doesn't like CR line endings #273

Closed thelittleO closed 5 years ago

thelittleO commented 5 years ago

When using biber with a bibliography containing carriage returns, only entries after the first CR are evaluated by biber. When the CRs are removed, biber just works fine for every entry.

The bibliography starts with a comment line: % Encoding: UTF8

Maybe there is switch built in that changes the relevant line break type to CR, leading to biber dismissing other types of line breaks?

Version: biber 2.12 Similiar Issue: #193

moewew commented 5 years ago

@thelittleO Can you show us an example (.bib file and a .tex file that uses it), please? You can paste the text here, but line ends will probably be mangled, so uploading them to another site that preserves line ends or emailing them to PLK in addition to posting them here might be a good idea.

Yesterday I thought I could reproduce the problem, but when I just tried my example file with CR line ends worked just fine.

moewew commented 5 years ago

I did some more testing and found that the behaviour seems to be a bit variable.

In the following all line breaks are only CRs.

% Comment
@article{sigfridsson,
  author       = {Sigfridsson, Emma and Ryde, Ulf},
  title        = {Comparison of methods for deriving atomic charges from the
                  electrostatic potential and moments},
  journaltitle = {Journal of Computational Chemistry},
  date         = 1998,
  volume       = 19,
  number       = 4,
  pages        = {377-395},
  doi          = {10.1002/(SICI)1096-987X(199803)19:4<377::AID-JCC1>3.0.CO;2-P},
}

compiles fine.


So does

% Comment

@article{sigfridsson,
  author       = {Sigfridsson, Emma and Ryde, Ulf},
  title        = {Comparison of methods for deriving atomic charges from the
                  electrostatic potential and moments},
  journaltitle = {Journal of Computational Chemistry},
  date         = 1998,
  volume       = 19,
  number       = 4,
  pages        = {377-395},
  doi          = {10.1002/(SICI)1096-987X(199803)19:4<377::AID-JCC1>3.0.CO;2-P},
}

But

% Comment
% another comment
@article{sigfridsson,
  author       = {Sigfridsson, Emma and Ryde, Ulf},
  title        = {Comparison of methods for deriving atomic charges from the
                  electrostatic potential and moments},
  journaltitle = {Journal of Computational Chemistry},
  date         = 1998,
  volume       = 19,
  number       = 4,
  pages        = {377-395},
  doi          = {10.1002/(SICI)1096-987X(199803)19:4<377::AID-JCC1>3.0.CO;2-P},
}

errors:

ERROR - BibTeX subsystem: C:\Users\Moritz\AppData\Local\Temp\AEjLKXt1Kw\cr-line.bib_1164.utf8, line 1, syntax error: at end of input, expected "@"

% Comment
% a
% b
% c
% d
@article{sigfridsson,
  author       = {Sigfridsson, Emma and Ryde, Ulf},
  title        = {Comparison of methods for deriving atomic charges from the
                  electrostatic potential and moments},
  journaltitle = {Journal of Computational Chemistry},
  date         = 1998,
  volume       = 19,
  number       = 4,
  pages        = {377-395},
  doi          = {10.1002/(SICI)1096-987X(199803)19:4<377::AID-JCC1>3.0.CO;2-P},
}

fails similarly.


@comment{hello}
% comment
@article{sigfridsson,
  author       = {Sigfridsson, Emma and Ryde, Ulf},
  title        = {Comparison of methods for deriving atomic charges from the
                  electrostatic potential and moments},
  journaltitle = {Journal of Computational Chemistry},
  date         = 1998,
  volume       = 19,
  number       = 4,
  pages        = {377-395},
  doi          = {10.1002/(SICI)1096-987X(199803)19:4<377::AID-JCC1>3.0.CO;2-P},
}

compiles without error, but fails to find sigfridsson:

WARN - I didn't find a database entry for 'sigfridsson' (section 0)

All files were tested as cr-line.bib in

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[style=authoryear, backend=biber]{biblatex}

\addbibresource{cr-line.bib}

\begin{document}
\cite{sigfridsson}
\printbibliography
\end{document}

This behaviour seems consistent with the fix offered in https://github.com/plk/biber/commit/867ad9877fda832479a030a966da9660c76bbd7c, which seems to only address comments in the first lines.

I guess what we really need is a CR->CR+LF conversion or similarly.

plk commented 5 years ago

I have put in a general fix which will normalise all Unicode linebreak sequences. Seems to fix these examples - can you try DEV?

moewew commented 5 years ago

Works very well with all the test files from above. Thank you very much.

thelittleO commented 5 years ago

@moewew looks like you found a issue a little different from mine.

MWE of the problem encountered by me:

biber_273.bib, ^M symbols carriage returns:

% Encoding: UTF-8

@Book{Kobek.2016,
 title     = {{I hate the Internet}},
 publisher = {{We Heard You Like Books}},
 year      = {2016},
 author    = {Kobek, Jarett},
 address   = {Los Angeles CA},
 edition   = {1. Ed.},
^M
@Book{Hardy.1990,^M
 author = {Hardy, G. H.},^M
 year = {1990},^M
 title = {{A mathematician's apology}},^M
 address = {Cambridge},^M
 publisher = {{Univ. Press}},^M
}^M

@Book{Schneier.1996,
  title     = {Applied cryptography},
  publisher = {J. Wiley \& Sons},
  year      = {1996},
  author    = {Bruce Schneier},
  address   = {New York},
  edition   = {2. Ed.},
  subtitle  = {Protocols, algorithms, and source code in C},
}

biber_273.tex:

\documentclass{scrbook}                                                                                             

\usepackage[utf8]{inputenc}                                                                                         
\usepackage[T1]{fontenc}                                                                                            
\usepackage[backend=biber, bibencoding=utf8]{biblatex} 

\addbibresource{biber_273.bib}                                                                                                                                                                                                  

\begin{document}

\cite{Kobek.2016}
\cite{Hardy.1990}
\cite{Schneier.1996}
\printbibliography

\end{document}

For Kobek.2016 biber gives the following warning: Biber warning: [682] Utils.pm:193> WARN - I didn't find a database entry for 'Kobek.2016' (section 0). The other entries compile just fine. Maybe biber has a problem with mixed line breaks.

moewew commented 5 years ago

@thelittleO So this file mixes different line endings? What are the other line ends? Is there any chance you can upload the file somewhere with unchanged line ends or send it to me per email (you can find my address in the biblatex-ext documentation: http://mirrors.ctan.org/macros/latex/contrib/biblatex-contrib/biblatex-ext/biblatex-ext.pdf)? In any case the issue should be resolved with the Biber dev version since it normalises line ends.

I notice that in the example Kobek.2016 is missing the closing curly brace. I'd also like to point out that I'm not too fond of double braces around titles. Only words that must always remain capitalised should be protected, everything else should not be wrapped in additional braces. See also https://tex.stackexchange.com/q/10772/35864, https://tex.stackexchange.com/q/474658/35864, https://tex.stackexchange.com/q/439440/35864.

moewew commented 5 years ago

@thelittleO Thank you very much for the test file. The situation is indeed quite different from what I initially thought was happening. The file in question mainly uses LF to end lines, but one entry uses CR+LF to end lines (the ^Ms in the example above). I can reproduce the issue with Biber 2.12 and am happy to tell you that the problem is resolved with the fix PLK pushed to the dev version of Biber 2.13. I'm afraid I can't offer a better workaround in the meantime than to either make sure all your line ends use the same terminator (easily possible with Notepad++) or to remove the comment in the first line.