Closed tom-a-horrocks closed 6 years ago
After a bit more reading I've discovered this is more of a bibtex issue. I've tried including an @string definition for a hyphen, but unfortunately that is also converted to an en dash. I have found one solution is to include a command in the .bib's preamble:
\documentclass{article}
\begin{filecontents}{test.bib}
@preamble{{\providecommand*\hyphen{-}}}
@article{test,
author = "Other, A. N.",
journal = "J. Irrep. Res.",
title = "Some things I did",
pages = "081401\hyphen 1--081401\hyphen4",
year = "2011"
}
\end{filecontents}
\begin{document}
\nocite{*}
\bibliography{test}
\bibliographystyle{ieeetr}
\end{document}
https://tex.stackexchange.com/questions/21773/hyphenating-a-number-in-the-bibtex-pages-field
Is it at all possible to do this within zotero/better-bibtex? I'd like to avoid editing the .bib directly if possible.
:robot: this is your friendly neighborhood build bot announcing test build 5.0.116.6221.issue-943 ("adjust test cases for #943").
OK so the hyphen issue is partly my fault, as BBT was a little zealous in changing anything dash-y into en-dashes. 6221 changes that. That should make what you want to do easier. Not trivial though.
There are two ways to get \hyphen
s in that field:
\hyphens
using BBTs "raw inserts", which would look like 081401<pre>\hyphen</pre> 1--081401<pre>\hyphen</pre>4
. Mind that the <pre>
bits will show up as-is if you use Zotero for non-BibTeX (ie Word) citations. This is strictly a BBT thing and Zotero doesn't know about it so will treat it as if you wanted the <pre>
to show up as text in the bibliography in Word.081401-1--081401-4
in 6221) using a postscript.For the preamble you'll have to use a postscript in any case as it stands. I am considering adding a preamble field, but I think I'd have to add two (what works for BibTeX will not necessarily work for BibLaTeX). The postscript would look like
if (Translator.BetterBibLaTeX) {
if (!Translator.preambleWritten) {
Zotero.write('@preamble{{\\providecommand*\\hyphen{-}}}\n');
Translator.preambleWritten = true;
}
if (this.has.pages) this.has.pages.bibtex = this.has.pages.bibtex.replace(/([0-9])-([0-9])/g, '$1\\hyphen$2');
}
which means:
<number>-<number>
, replace that hyphen with \hyphen
.really need that feedback.
As far as I recall, a page range in a bib file should always be given as "1-3", i.e., with a single hypen. Depending on the .bst file, the single hypen for page range in the .bib file will be expanded to an em-dash or, in some rare cases, to an en-dash.
I think that's mostly what it does now, right? Have you tested the new behavior?
I have not tested it, but I believe you. My comment was meant as just that. Another comment is that the page range "16-1 -- 16-4" is in many journals written as "16(4)".
Thanks for your work on this. Note that in the meantime I've simply used 16:1-4, which should be fine for me.
The page numbers '16-1',...'16-4' are what are printed on the conference abstract itself. What's happening is that '16' is an electronic article identifier (separate to DOI). What complicated matters is that there's no field for this identifier except perhaps for issue, which isn't available for conference abstracts (@inproceedings) -- and sometimes journal articles have an issue number AND an electronic identifier anyway. I guess writing 16(1-4) in the page field may be a realistic compromise?
Note that these identifiers can change significantly. For example, I have another which is We MIN 06, and I'm yet to settle on a principled way to get these into the bibliography.
@njbart, is it correct I should use a single hyphen for page ranges? This is mostly related to import, because I'm going to pass on what's in the pages
field as-is on output, only translating a unicode en-dash to --
, and unicode m-dashes to ---
for output.
Many BibTeX style files (.bst) files will do a search and replace, so that "-" is replaced by "--" in the output (.bbl file). This is certainly true for all the Physics journals that I have published in.
However, some journals use an en-dash in the page range (I seem to recall that I have seen this in some French journals, but don't quote me on that). So always using "--" this will call for extra corrective work in the .bbl file for these journals.
On 10 April 2018 at 13:30, Emiliano Heyns notifications@github.com wrote:
@njbart https://github.com/njbart, is it correct I should use a single hyphen for page ranges? This is mostly related to import, because I'm going to pass on what's in the pages field as-is on output, only translating a unicode en-dash to --, and unicode m-dashes to --- for output.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/retorquere/zotero-better-bibtex/issues/943#issuecomment-380066323, or mute the thread https://github.com/notifications/unsubscribe-auth/AHlBJDldgYhUXjgrq2lq9salaVnbDJPRks5tnJfhgaJpZM4THybU .
--
Emeritus Professor Dr Bo Thidé https://www.researchgate.net/profile/Bo_Thide Swedish Institute of Space Physics (IRF), P. O. Box 537, SE-751 21 Uppsala Phone: Office +46184715902 Mobile/Cell +46705613670 Home +46184951801 Visiting address: Ångström Laboratory, Rm# 84108, Lägerhyddsvägen 1, Uppsala
I'm not always using --
, I'm just translating U+2013 to --
and U+2014 to ---
. Hyphens (regardless of how many you have) will be left untouched.
I was referring to the average user who uses "12--17" instead of the more preferable "12-17" in his/her .bib file.
If I can be sure that a user never wants a double-hash in the pages field (@njbart?) then perhaps I could replace them, but it seems iffy.
In some cases, I need some work to be left for cleanup by the user; can't algorithmically catch them all ¯\_(ツ)_/¯. A postscript is always an option.
Please visit https://verbosus.com/bibtex-style-examples.html to find examples of how .bib entries are entered in the best way. Notice that page ranges shall be separated by a single "-". This hyphen is not just a character, but rather a page number separator that is to be replaced by a proper dash of the correct type (or something else, depending on the requirements of the actual publisher). Notice also that pure, single numbers, such as in "year", "number", "month", "volume", "series" and so on, are not to be enclosed by brackets or inverse commas.
I'd really rather hear from @njbart (or @plk); the biblatex processors are insanely lenient, so what works is not always how it's supposed to be. But in the meantime I can change the import back to single hyphen.
But if there's an U+2014 or U+2013 there, by assumption the user who entered this wants an em- or en-dash, so I'd rather stick to that.
From the biblatex 3.11 release notes: “Hyphens and dashes in page ranges will be transformed to \bibrangedash
, commas and semi-colons to \bibrangesep
.” (https://github.com/plk/biblatex/wiki)
So my understanding is that any number of consecutive hyphens or dashes, including U+2014 or U+2013 will all be transformed to \bibrangedash
.
Protecting hyphens and dashes can be achieved by wrapping them in curly braces. So my guess is (untested though) that the OP could get the desired result by using, e.g., pages = {16{-}1--16{-}4}
– though pages = {16{-}1-16{-}4}
should be expected to work just as well.
As to a suitable heuristic for BBT distinguishing hyphen/dash chars that should not be protected (i.e. those intended to be ultimately mapped to \bibrangedash
) from those which should, I guess something like “protect all strings consisting of consecutive hyphens or dashes, except for the longest such string” could do the trick:
BBT would map 16-1--16-4
to pages = {16{-}1--16{-}4}
, 16--1---16--4
to pages = {16{--}1---16{--}4}
, etc. (The second example would then be rendered as 16–1–16–4
, where any visual distinction is lost again, and there’s nothing BBT would be protecting in a string such as 16-1-16-4
, but this is the best I can think of.)
Would a single em-dash (u+2014, usually translated to triple dash in latex) count as longer or shorter than a double hypen?
In good typography (a definition that varies from language/country to language/country), four different "dashes" are used:
Hyphenation: "Andy Fairweather-Lowe", breaking a multisyllable word at the end of a line. In LaTeX: "-" (single "-").
Range: "The years 1939-1945". In LaTeX: "--" (double "-").
Separation: "Typesetting - a difficult skill". In LaTeX: " --- " or (e.g., in American typography) "---" (triple "-").
Negation: "The temperature is -3 degrees C". In LaTeX: "$-$" (math mode, single "-").
On Fri, 13 Apr 2018, 08:04 Emiliano Heyns, notifications@github.com wrote:
Would a single em-dash (u+2014, usually translated to triple dash in latex) count as longer or shorter than a double hypen?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/retorquere/zotero-better-bibtex/issues/943#issuecomment-381032924, or mute the thread https://github.com/notifications/unsubscribe-auth/AHlBJOeAfFNHulKQa9Rn9osBoV1hb0kMks5toD-2gaJpZM4THybU .
Except if @njbart 's interpretation of the biblatex wiki is correct, any number of non-braced consecutive dash symbols of various kinds would constitute a \bibrangedash
. biblatex has it's own parsing and interpretation rules, and will output TeX code as a result, but the input isn't necessarily interpreted as (La)TeX itself would.
@njbart offered a heuristic to determine what dash-like things to brace and which not, but "longest" to me is ambiguous on whether it means pre-processing length (in which case double-hyphen would be longer than em-dash) or post-processing length (in which case em-dash is longer than double-hyphen).
Not at all sure I'm going to do this yet, as I'd have to do further parsing of the pages field for multiple ranges, and parsing of Zotero input is brittle. But I'm considering doing it.
What I had in mind was post-processing length, i.e., en-dash=double-hyphen longer than single-hyphen (and em-dash=triple-hyphen longer than double-hyphen – though I’m not sure the latter situation ever occurs in the wild).
Neither have I, but nothing surprises me at this point. The state of references ready-to-import for Zotero is not stellar, and all kinds of stuff ends up in the database.
Hope you don't mind me butting in here. I can only say things with confidence for the biblatex
side. BibTeX as you know is an inhomogeneous realm of .bst
files that do not always follow the same line.
@bothide is right when they say that the dash can be considered a kind of meta character in the pages
field. For the standard BibTeX styles as far as I can see what happens is simply that single -
s are doubled up to become --
(this is done using the function $substring
that treats braces and macro construct simply as ASCII chars, so no amount of brace protection can help here).
However, the BibTeX documentation states (http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf, p. 11):
pages
One or more page numbers or range of numbers, such as42--111
or7,41,73--97
or43+
(the ‘+
’ in this last example indicates pages following that don’t form a simple range). To make it easier to maintain Scribe-compatible databases, the standard styles convert a single dash (as in7-33
) to the double dash used in TeX to denote number ranges (as in7--33
).
So it seems that back when BibTeX was devised the preferred way was actually a double dash and the single dash was only used for backwards compatibility reasons. I don't know if there are any more authoritative sources nowadays that recommending -
over --
, but popular use may simply have made -
the more prevalent and the de-facto standard: It's simpler to type, after all.
For biblatex
the 'meta' capacity of -
is made clearer by the fact that pages
is not a literal field that is largely left as is, but a range field that is parsed by Biber.
I do, however, not agree with the sentiment that numeric fields should always be written without braces. It is a feature of the .bib
file syntax that "numerical values" do not need braces (or quotes):
For numerical values, curly braces and double quotes can be omitted.
(Nicolas Markey: Tame the BeaST, p. 20, http://mirrors.ctan.org/info/bibtex/tamethebeast/ttb_en.pdf)
But this is clearly worded as optional here and I haven't seen anyone else endorsing leaving out the braces. In fact pages = 1-45,
will fail, so pages = 1,
is risky if you want to add something later on. The risk is lower for export tools such as yours here, but I still think it is better to go with the braces. Still the only advice I have seen with regards to number fields and braces is to always write the braces even if they are not required.
biblatex
actually has two levels at which it can deal with page ranges: Biber parses page ranges in the pages
field, but pages as given in the optional postnote argument to \cite
and friends are not passed on to Biber and are parsed by biblatex
with (La)TeX code.
Biber parses the pages
field as a range field and tries to make sense of it from that perspective using Perl RegEx.
Roughly, Biber splits the field at ,
and ;
and then treats each bit separately. At first a RegExp that matches "(non-dash chars)(dash chars)(non-dash chars)" tries to read off the start and end of a page range. If that does not match, a fallback pattern "(any char)(at least two dash chars)(any char)" tries to find the start and end of the range. The range is then written to the .bbl
as <start>\bibrangedash <end>
.
Note that brace protection does not do anything for Biber. Furthermore, any number and all kinds of dashes are treated equally as long as RegExp recognises the character as dash-like, the only exception being the fallback pattern that specifically needs at least two dash-like characters to match (so pages = {16-1--16-4},
with double ASCII dash works, but pages = {16-1–16-4},
with U+2013 does not; adding braces in the obvious position changes nothing for Biber).
If all else fails, the field is read as literal and just dumped to the .bbl
file without digestion. A warning is issued in that case.
biblatex
also parses pages
and other fields potentially containing page ranges on a LaTeX level.
The passage of the biblatex
Wiki @njbart quotes is referring specifically not to the pages
field, but rather to postnote
and friends that do not get pre-chewed, normalised input from Biber. Ideally the pages
field would still be formatted in a way that it can also be parsed by the LaTeX range parser since custom styles may well apply the range parser also for pages
. This will prove difficult due to an unforeseen interference in biblatex
's macros, so need not be your primary aim at the moment.
The LaTeX range parser builds on low-level LaTeX and can only deal with Unicode characters if a Unicode engine is used (XeTeX, LuaTeX). With pdfTeX only ASCII chars are gracefully handled. So it is a good idea to only export ASCII chars to the pages
field if possible (I believe you are already doing that).
The range parsing then works similar to Biber's routine. It splits at ;
, ,
and \bibrangessep
. Each chunk is then split up at the first occurrence \bibrangedash
, --
or -
(--
is never matched only as -
). The command then prints the start and end of the range with \bibranmgedash
in between.
Certain characters can be hidden in this step by wrapping them in curly braces. Unfortunately this only works theoretically at the moment, because the \ifpages
test can't deal with these hidden characters and the braces surrounding them. This means that presently a hyphen needs to be hidden with a command \newcommand*{\pagehyphen}{-}
that can be made invisible itself with \NumCheckSetup{\let\pagehyphen\@empty}
: then \cite[16\pagehyphen 1-16\pagehyphen 14]{sigfridsson}
gives the expected output. I'll have a look if \cite[6{-}1-6{-}14]{sigfridsson}
can be salvaged, but that looks really tough.
What does that mean for you?
You don't need to add braces. They don't have a benefit for the cases you have considered so far.
Converting U+2013/U+2014 makes sense and should be unproblematic.
There is no value in 'over-normalising' --
back to -
for the average biblatex
user. The same holds for the BibTeX standard styles.
Converting --
to -
might be a good or bad idea (depending on how you look at it) for BibTeX styles that do not convert -
to --
internally (I seem to remember the French prefer a -
in page ranges and not --
): It is good if people are somehow used to typing --
in Zotero and actually want their BibTeX style to determine the dash regardless of their input. It is bad if people explicitly want an en-dash in styles that (intentionally or not) do not convert -
to --
. My money is on not converting --
back to -
, this makes the next step easier.
I would do nothing about 6-1-6-14
if I were you, even for a human it is almost incomprehensible what that ought to mean. With biblatex
and Biber 6-1--6-14
will give the expected output and is easier on then eye for humans as well: A user can be expected to input this instead of 6-1-6-14
- no BBT intervention needed. For BibTeX one would have to resort to \hyphen
from above. I would find it a bit too intrusive, though, if BBT were to do the @preamble
stuff by default.
Let me just repeat my comment that a convenient (and, seemingly, nearly a de facto standard) way of writing a page range of the type 6-1 through 6-14 is 6(14). This is used by, e.g., the American Physical Society publications such as the Physical Review journals.
IOW the current behavior in the regular release is OK as-is?
I don't use BBT (or Zotero for that matter), so verification would have to come from someone who does. But from what I can read here things should be fine if BBT does not change -
to --
any more (I think you mentioned that build 6221 does not do this any more, is that part of the regular release now?).
I had a look at normalizeDashes
and
https://github.com/retorquere/zotero-better-bibtex/blob/753f0cc27750f532cf560f76c5cd2991f3d9f8b1/translators/bibtex/reference.ts#L401
still seems to convert some -
to --
.
normalzeDashes
also seems to replace U+2012 (figure dash) with an em-dash
https://github.com/retorquere/zotero-better-bibtex/blob/753f0cc27750f532cf560f76c5cd2991f3d9f8b1/translators/bibtex/reference.ts#L399
I'd probably go for an en-dash or even a hyphen instead.
I'll get those changed later today.
:robot: this is your friendly neighborhood build bot announcing test build 5.0.129.6395.issue-943 ("adjust tests for #943").
I still get the following warning in my BBT exported BibLaTeX file: @% ? hyphen found in pages field, did you mean to use an en-dash?
I thought -
will now be kept as is and it is not required to put --
between pages. What did I miss?
Fixed, will be in the next release.
:robot: this is your friendly neighborhood build bot announcing test build 5.0.137.6668.master ("re-fixes #943").
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi all,
I'm trying to export to bibtex the citation for a journal article with page numbers "16-1", "16-2", "16-3", and "16-4". I'd like the page range to appear in bibtex as '16-1--16-4'. Unfortunately, if the Pages field in Zotero is '16-1-16-4', then all hyphens are converted to en dashes and the corresponding bibtex field is '16--1--16--4'. Is there any way to escape hyphens here, or alternatively force '16-1' and '16-4' to be interpreted as strings?