Closed JohnLukeBentley closed 6 years ago
The other orig...
fields with the exception of origlanguage
(which is the only field used by the standard styles) are not mentioned either.
I'd have thought that the orig...
fields belong to the 'generic fields' (of which abstract
, annotation
, label
, and shorthand
are only four examples). But others might think differently.
There are many more exotic fields that are missing. As soon as author
is supported, authortype
is as well, but it is never explicitly mentioned in the section. The same holds for editortype
. bookpagination
gets no mention either. entrysubtype
is probably one of the 'generic' fields and also not mentioned. file
which is not supported by any of the standard styles is also not mentioned. And probably a few more.
Ultimately, however, some of this can be explained with a snipped just a few lines further up from the one you quote (emphasis mine)
Note that the mapping of fields to an entry type is ultimately at the discretion of the bibliography style. The lists below therefore serve two purposes. They indicate the fields supported by the standard styles which ship with this package and they also serve as a model for custom styles.
Given that origdate
and friends are not supported by the standard styles, the lists stay true to their objective.
If you are looking for technical advice
See the default data model specification in the file
blx-dm.def
which comes withbiblatex
for a complete specification.
Note that since biblatex
3.8 is very close to being released, you should probably validate against the new blx-dm.def
and not the one from 3.7 (there were quite some changes, even involving origdate
).
Thanks moewew.
Indeed you've correctly anticipated the motive. I've been helping @retorquere out over at https://github.com/retorquere/zotero-better-bibtex/issues/751. When using Zotero-Better-Bibtex to export from Zotero into a Biblatex (or bibtex) *.bib file he has a lint parser that he calls a "Quality Report". That is, any fields that aren't Biblatex (or bibtex) conformant get flagged.
For biblatex he's constructed rules in https://github.com/retorquere/zotero-better-bibtex/blob/751/resource/bibtex/biblatex.entry-types.json that are based off the documentation in biblatex.pdf.
So indeed it was exacting "technical advice" that I was thinking of: the idea is to help @retorquere, and tool makers like him, have ready access to the rules about what biblatex fields are supported generally and for particular entry types.
So I suppose a "biblatex conformant field [my language]" is a bit ambiguous. That is, if I have it right, biblatex supports fields at three levels:
author
, title
);origdate
): a standard style would simply ignore this kind of field.squigglefactor
): a standard style would also ignore this kind of field.And I further suppose a lint tool maker like @retorquere will be really after validation rules against the first two levels.
From https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/blx-dm.def it looks like @retorquere could find the entry independent fields (whether "Generic", a "Special Field", or a third category not included in the previous two) from\DeclareDatamodelEntryfields{
(from line 605). In that list I'm seeing the fields you mention: orig...
(except origlanguage
), abstract
, annotation
, label
, and shorthand
; entrysubtype
; file
.
There's also, on line 559, some of the exotically contingent (e.g. "As soon as author
is supported, authortype
is as well) fields as well ...
\DeclareDatamodelFields[type=field, datatype=key]{
authortype,
editoratype,
editorbtype,
editorctype,
editortype,
bookpagination,
nameatype,
namebtype,
namectype,
pagination,
pubstate,
type}
What would be the best way to help lint tool makers like @retorquere to identify the biblatex conformant fields (at "level" 1 and 2)?
blx-dm.def
; orMaybe this need is rare enough, or ad hoc enough, just to want to help @retorquere out specifically (rather than rework the biblatex code and docs to anticipate future toolmakers).
Since the datamodel commands in blx-dm.def
are written in an easy to parse XML format to the .bcf
on first latex run - if automation is required, just compile a minimal document with the latest biblatex and parse the parts you need of the .bcf
. I have a RelaxNG schema for the .bcf
. This is what biber
does every run so it can use the datamodel internally to determine many things dynamically.
Can we close this?
Thanks @plk.
Am I reading the .bcf
corrrectly? ...
<bcf:entryfields>
node that has no <bcf:entrytype>
child node (with the text value of the first child "abstract" and the last two children "xref", "year") is THE node from which to extract the full list of entry independent fields: (whether "Generic", a "Special Field", or a third category not included in the previous two); <bcf:entryfields>
nodes that have at least one <bcf:entrytype>
child node;<bcf:constraints>
node with an <bcf:entrytype>
child node.... and that should be all @retorquere needs for his https://github.com/retorquere/zotero-better-bibtex/blob/751/resource/bibtex/biblatex.qr.json
?
OK, I can follow that. What do the bcf:inheritance
and bcf:sorting
nodes do?
Is something similar to bcf
files available for bibtex BTW?
@JohnLukeBentley - that's correct. @retorquere - the only thing you need to know about is the <bcf:datamodel>
node - the others are for actually processing data for a particular document. I can give you the bcf.rnc
or bcf.rng
format schema which is commented if you need it?
bibtex
reads just the .aux
file which doesn't have that much in it and it just a file full of TeX macros. biber
does a lot more than bibtex
, hence the amount of information in the .bcf
.
Since each .bst
can decide for itself which fields it supports, there is no such thing as the .bcf
for BibTeX. You'd have to go to the .bst
file directly. Of course with Biber the model is customizable as well, so any hard-coded list could be lacking.
If Zotero could read a particular .bcf
, I suppose this could be sensitive to a particular document. The .bcf
is the entirety of the data (apart from the .bib
files themselves) that biber
uses, by design.
@plk the schema with comments would be very helpful -- thank you!
@moewew: I have no intention to get this to pick out all possible errors, which, as you've rightfully pointed out, is a fools errand. But in my case it'd be a case of the perfect getting in the way of the good -- giving hints about how things relate to the standard styles are pretty useful for me and perhaps others. I was docked points on my thesis for easily preventable errors that would have been picked out easily even by the simple linter I'm building now.
So, also no list of commonly used styles + fields then? Shame, but then the lint config for bibtex will just grow one bugreport to BBT at the time.
The BibTeX consensus is described in http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf. But you can't expect every style to follow what is described there to the letter. And some styles define other fields - many newer styles define url
, ...
@retorquere - you can get these any time from the biber
github code repo in the data/schemata
folder.
@retorquere - You should get the version from the dev branch currently as this hasn’t been merged for release yet.
@moewew I agree, but a) this lint results are advisory only, and b) people should be using biblatex.
How should I interpret <bcf:maps datatype="bibtex" level="driver">
?
Good point. These coerce common legacy bibtex entry types and fields into their biblatex default datamodel equivalents. They are there to handle older .bib files that exist. When creating modern biblatex-only .bib files, the targets should be used. If you are aiming to create .bib files that bibtex users can use too, then use the source names as biber will map them dynamically to the biblatex names.
Still having some trouble figuring out how the bcf
fits together.
It looks like the bcf:entryfields
records specify, per type(s), which fields are allowed but optional; if there is no bcf:entrytype
inside, it applies to all reference types.
Then there's the bcf:constraints
, which list the constraints placed on references of the types contained in their bcf:entrytype
records. Fields which are mandatory are of course by implication allowed. fieldxor
fields mean only one can occur in the reference; if there's no xor, all of the contained fields are required.
Constraints that mark a field as being of a certain type (the "data"/"type" marks) get type checking but only if they're present; they must first occur in the list of allowed fields before data checking becomes relevant.
Correct?
Yes, correct. There are a few comments in the .rnc
file. I am happy to add some more detailed comments in there if you need them.
I think I have it now except for the maps
stuff.
BBT itself will only export the standard fields per the docs, but has a facility for custom fields and postprocessing where the user can do pretty much as they please; it is the full outcome of BBT processing + custom fields + postprocessing that I'd like to check.
But if I'm reading you right, the maps
section is really only to accommodate legacy bibliographies and I should not be encouraging people to generate legacy files by "endorsing" them as valid biblatex bibliographies. Yes?
Right - if your plugin is aimed at biblatex, then you can ignore the maps section anyway.
Is bcf:entrytypes
supposed to list all possible entrytypes? I don't see mvproceedings
in mine.
How should I interpret fieldor
? I take it as "at least one of these", but I'd like to make sure.
Strange - it is in blx-dm.def
and should be in all .bcfs. Are you generating a
.bcf` using an older version?
fieldor is as you say. See the comments in bcf.rnc
.
@retorquere I'm seeing mvproceedings
in my bcf:entrytypes
.
The latest dev version of biblatex is found at sourceforge:
(Incidentally, to all and as a general matter, I'm familiar with XSLT but not RelaxNG. On a quick glance RelaxNG, especially the compact form, looks far easier to work with ... https://en.wikipedia.org/wiki/RELAX_NG).
I'm getting my bcf from a sharelatex compilation. If there's a better place to get the most recent one (a biber test case perhaps?) I can just use that.
My understanding is that you can generate a .bcf
from any biblatex supporting .tex
file.
See Gist-Biblatex-Mwe.tex and Gist-Biblatex-Mwe.bcf ... files I've created.
You could use Gist-Biblatex-Mwe.bcf directly and alone (if that's all you need and don't have a current latex install).
There is a current test .bcf
on SourcrForge in here:
https://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/testfiles/
My crappy macbook won't tolerate a latex install.
In that test.bcf
I find
<bcf:constraints>
...
<bcf:entrytype>suppperiodical</bcf:entrytype>
...
</bcf:constraints>
but <bcf:entrytype>suppperiodical</bcf:entrytype>
is not in <bcf:entrytypes>
.
I could just infer that any <bcf:entrytype>
I find is a supported entrytype, but then I don't understand what the purpose is of <bcf:entrytypes>
.
Corrected in biblatex and the test file.
@retorquere wrote:
.... Constraints that mark a field as being of a certain type (the "data"/"type" marks) get type checking but only if they're present; they must first occur in the list of allowed fields before data checking becomes relevant. ... Correct?
@plk wrote:
Yes, correct.
On the above @retorquere quoted point this seems not quite correct. In test.bcf
(and also the .bcf
I generate) ...
There is (so it seems) a general (entry independent) <bcf:constraints>
node (the one without a <bcf:entrytype>
child). And under that node a date
datatype constraint node that lists origdate
(and other date types).
However, under the general (entry independent) <bcf:entryfields>
there is no origdate
field (although there are other orig*
fields).
Am I reading matters correctly that @retorquere would need, for the purpose of identifying what fields biblatex supports at a basic level (whether or not biblatex standard styles also support the field), to include all the fields under the general <bcf:constraints>
node isbn
, issn
, .... origdate
, urldate
as they aren't necessarily going to listed under the general or entry specific <bcf:entryfields>
nodes?
(Although I see that isbn
, for example, is listed under entry specific <bcf:entryfields>
).
My bad. New build underway.
You'll have to correct me if I'm wrong, but date
fields are special. They should always appear only as date
(origdate
, urldate
, ...) in the .bib
file (year
and month
are legacy exceptions), but they are decomposed into their date parts by Biber, the datamodel lists the date parts in \DeclareDatamodelEntryfields
/<bcf:entryfields>
.
The fields listed under <bcf:entryfields>
without an <bcf:entrytype>
(\DeclareDatamodelEntryfields
without an optional argument) are supported by all types, those listed under <bcf:entryfields>
with <bcf:entrytype>
(\DeclareDatamodelEntryfields
with optional argument) are specific to specified the type.
I think the <bcf:constraints>
just give additional info about certain constraints that fields need to satisfy. In particular the type
, datatype
constrains of ISBN and date are just for validating the content of the fields, not whether or not they can be used in general. (ISBN is listed normally in <bcf:entryfields>
, date is special as explained above.)
So does that mean that the <bcf:constraints>
block which has no <bcf:entrytype>
means
if your find any of these fields, test them according to these rules
not
these fields are allowed but not required for all types, and if you find any of these fields, test them according to these rules
?
This is different from other <bcf:constraints>
blocks which do imply that fields they are testing are allowed by the types they list. Yes?
I don't understand what is being said in the discussion on dates being special. The BBT linter currently only tests for field presence, so maybe that's where I'm getting lost; the discussion of dates being special seems to relate to the content of the fields, not the presence.
Again, @plk may have to correct me here, but my understanding is that <bcf:constraints>
/\DeclareDatamodelConstraints
always has the first interpretation of
if your find any of these fields, test them according to these rules (which may well imply that a field is required, not required, ... etc.)
regardless of whether or not a <bcf:entrytype>
/optional argument is present. Whether or not a field is allowed is decided by <bcf:entryfields>
/\DeclareDatamodelEntryfields
alone, <bcf:constraints>
/\DeclareDatamodelConstraints
has no say in that matter.
You could of course argue that being present in <bcf:constraints>
somewhat implies that a field is allowed, but that is not the case, strictly speaking. I just did a test with the default constraints and a minimal set of \DeclareDatamodelEntryfields
, the existence of the constraints did not allow the fields to be seen as valid.
<bcf:constraints>
/\DeclareDatamodelConstraints
has two purposes: (1) To check the contents of fields (date
fields need to be given in ISO/EDTF format, gender
has only certain values, ISBNs can be validated, ...) and (2) to check certain relations between fields (a field can be mandatory, it can be allowed only in combination with another field or to the contrary only if another field is not present)
Dates are special in that you won't find date
or origdate
, ... in <bcf:entryfields>
/ \DeclareDatamodelEntryfields
even though date fields are allowed in many places. What you'll instead find in <bcf:entryfields>
/\DeclareDatamodelEntryfields
are the date parts. So if date
is a valid field you will find year
, month
, day
, etc. instead (if origdate
is valid, you'll see origyear
, origmonth
, ...). date
is only found in \DeclareDatamodelConstraints
. That is something your validator has to catch. Instead of the date parts listed in the .bcf
actually only the full date
field is valid in the .bib
. I think this has to do with how Biber splits the date
field and when the validation takes place with Biber.
Whether or not a field is allowed is decided by
@moewew but wouldn't that mean that no entry type can have a date
field? date
is only listed in constraints, not in <bcf:entryfields>
.
How did you cause the collapsible MWE in the comment you posted BTW?
This is exactly what I mean with 'date is special'. It is allowed, but instead of date
we find all the date parts in <bcf:entryfields>
. See also the second part of my comment above.
Wait... so day
is in <bcf:entryfields>
but it's not actually a valid field, while date
is not in <bcf:entryfields>
but it is a valid field.
is this exclusive to fields ending in date
, year
, month
and day
?
Yup, that is the upshot. I think that date
-like fields (date
, urldate
, origdate
, ...) and their date parts are the only ones that suffer from this particular inconsistency.
Date parts are fields ending in
month,day,hour,minute,second,timezone,season,endmonth,endday,endhour,endminute,endsecond,endtimezone,endseason
So
urlday,
urlendday,
urlendhour,
urlendminute,
urlendmonth,
urlendsecond,
urlendtimezone,
urlendyear,
urlhour,
urlminute,
urlmonth,
urlsecond,
urltimezone,
urlyear,
are all listed in <bcf:entryfields>
, but only urldate
is valid (which is indeed not listed).
There is one exception: month
and year
are valid fields whenever date
is valid for legacy reasons (so these are doubly special). This does not hold for other dates such as urldate
, origdate
, eventdate
, there only the date field is valid, not its date parts.
This relates to all bcf:fields
that have datatype="datepart"
, yes?
Yes, the 'rule' is that only the date
field with the appropriate prefix is supported in the .bib
file. (Except for month
and year
.)
My script currently extracts this regex from the BCF to detect datepart fields:
/^(event|orig|url)?(endyear|year|month|day|hour|minute|second|timezone|season|endmonth|endday|endhour|endminute|endsecond|endtimezone|endseason)$/
and yields these as valid date fields:
That should be correct.
I wouldn't use regexps though - it's probably better to parse the XML properly.
I do parse the XML using an XML parser/xpath -- it's just that I peek into the contents of the <bcf:entryfield>
and I use that regex to decide whether I should replace it with <first capture group>date
; so if I find eventendyear
I know that corresponds with eventdate
.
I'm not parsing XML with regexen 😆 that way madness lies.
(when I say "extracts the regex from the BCF" I mean that regex was not written by hand, it is generated by inspecting bcf:fields
that have datatype="datepart"
/datatype="date"
)
Right, it does - I've done it in the distant past (when XML didn't even exist and everything was SGML ...) when there were no parsers ...
From biblatex.pdf, page 7, Seciton 2 "Database Guide", there's a list of particular Entry Types and the fields they support. There's also mention of the fields supported by every entry ...
The
origdate
field is not mentioned: in any particular entry; nor in the special fields list; nor as one of the four "generic" fields. I think it is true the all entry types supportorigdate
. If that is right then perhaps it should be added to the special fields list. In any case it seems that it should be expressed somehow thatorigdate
is supported by all (or most?) entries.Does this issue throw up the same consideration for another field?