plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
510 stars 117 forks source link

Biblatex.pdf. Express that `origdate` field (and others?) are supported by entries. #638

Closed JohnLukeBentley closed 6 years ago

JohnLukeBentley commented 6 years ago

From biblatex.pdf, page 7, Seciton 2 "Database Guide", there's a list of particular Entry Types and the fields they support. There's also mention of the fields supported by every entry ...

Generic fields like abstract and annotation or label and shorthand are not included in the lists below because they are independent of the entry type. The special fields discussed in § 2.2.3, which are also independent of the entry type, are not included in the lists either.

The origdate field is not mentioned: in any particular entry; nor in the special fields list; nor as one of the four "generic" fields. I think it is true the all entry types support origdate. If that is right then perhaps it should be added to the special fields list. In any case it seems that it should be expressed somehow that origdate is supported by all (or most?) entries.

Does this issue throw up the same consideration for another field?

moewew commented 6 years ago

The other orig... fields with the exception of origlanguage (which is the only field used by the standard styles) are not mentioned either.

I'd have thought that the orig... fields belong to the 'generic fields' (of which abstract, annotation, label, and shorthand are only four examples). But others might think differently.

There are many more exotic fields that are missing. As soon as author is supported, authortype is as well, but it is never explicitly mentioned in the section. The same holds for editortype. bookpagination gets no mention either. entrysubtype is probably one of the 'generic' fields and also not mentioned. file which is not supported by any of the standard styles is also not mentioned. And probably a few more.

Ultimately, however, some of this can be explained with a snipped just a few lines further up from the one you quote (emphasis mine)

Note that the mapping of fields to an entry type is ultimately at the discretion of the bibliography style. The lists below therefore serve two purposes. They indicate the fields supported by the standard styles which ship with this package and they also serve as a model for custom styles.

Given that origdate and friends are not supported by the standard styles, the lists stay true to their objective.

If you are looking for technical advice

See the default data model specification in the file blx-dm.def which comes with biblatex for a complete specification.

Note that since biblatex 3.8 is very close to being released, you should probably validate against the new blx-dm.def and not the one from 3.7 (there were quite some changes, even involving origdate).

JohnLukeBentley commented 6 years ago

Thanks moewew.

Indeed you've correctly anticipated the motive. I've been helping @retorquere out over at https://github.com/retorquere/zotero-better-bibtex/issues/751. When using Zotero-Better-Bibtex to export from Zotero into a Biblatex (or bibtex) *.bib file he has a lint parser that he calls a "Quality Report". That is, any fields that aren't Biblatex (or bibtex) conformant get flagged.

For biblatex he's constructed rules in https://github.com/retorquere/zotero-better-bibtex/blob/751/resource/bibtex/biblatex.entry-types.json that are based off the documentation in biblatex.pdf.

So indeed it was exacting "technical advice" that I was thinking of: the idea is to help @retorquere, and tool makers like him, have ready access to the rules about what biblatex fields are supported generally and for particular entry types.

So I suppose a "biblatex conformant field [my language]" is a bit ambiguous. That is, if I have it right, biblatex supports fields at three levels:

  1. According to the standard styles (e.g. author, title);
  2. Explicitly available for non-standard styles (e.g. origdate): a standard style would simply ignore this kind of field.
  3. Any other field for non-standard styles (e.g. squigglefactor): a standard style would also ignore this kind of field.

And I further suppose a lint tool maker like @retorquere will be really after validation rules against the first two levels.

From https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/blx-dm.def it looks like @retorquere could find the entry independent fields (whether "Generic", a "Special Field", or a third category not included in the previous two) from\DeclareDatamodelEntryfields{ (from line 605). In that list I'm seeing the fields you mention: orig... (except origlanguage), abstract, annotation, label, and shorthand; entrysubtype; file.

There's also, on line 559, some of the exotically contingent (e.g. "As soon as author is supported, authortype is as well) fields as well ...

\DeclareDatamodelFields[type=field, datatype=key]{
  authortype,
  editoratype,
  editorbtype,
  editorctype,
  editortype,
  bookpagination,
  nameatype,
  namebtype,
  namectype,
  pagination,
  pubstate,
  type}

What would be the best way to help lint tool makers like @retorquere to identify the biblatex conformant fields (at "level" 1 and 2)?

Maybe this need is rare enough, or ad hoc enough, just to want to help @retorquere out specifically (rather than rework the biblatex code and docs to anticipate future toolmakers).

plk commented 6 years ago

Since the datamodel commands in blx-dm.def are written in an easy to parse XML format to the .bcf on first latex run - if automation is required, just compile a minimal document with the latest biblatex and parse the parts you need of the .bcf. I have a RelaxNG schema for the .bcf. This is what biber does every run so it can use the datamodel internally to determine many things dynamically.

plk commented 6 years ago

Can we close this?

JohnLukeBentley commented 6 years ago

Thanks @plk.

Am I reading the .bcf corrrectly? ...

... and that should be all @retorquere needs for his https://github.com/retorquere/zotero-better-bibtex/blob/751/resource/bibtex/biblatex.qr.json

?

retorquere commented 6 years ago

OK, I can follow that. What do the bcf:inheritance and bcf:sorting nodes do?

retorquere commented 6 years ago

Is something similar to bcf files available for bibtex BTW?

plk commented 6 years ago

@JohnLukeBentley - that's correct. @retorquere - the only thing you need to know about is the <bcf:datamodel> node - the others are for actually processing data for a particular document. I can give you the bcf.rnc or bcf.rng format schema which is commented if you need it?

bibtex reads just the .aux file which doesn't have that much in it and it just a file full of TeX macros. biber does a lot more than bibtex, hence the amount of information in the .bcf.

moewew commented 6 years ago

Since each .bst can decide for itself which fields it supports, there is no such thing as the .bcf for BibTeX. You'd have to go to the .bst file directly. Of course with Biber the model is customizable as well, so any hard-coded list could be lacking.

plk commented 6 years ago

If Zotero could read a particular .bcf, I suppose this could be sensitive to a particular document. The .bcf is the entirety of the data (apart from the .bib files themselves) that biber uses, by design.

retorquere commented 6 years ago

@plk the schema with comments would be very helpful -- thank you!

@moewew: I have no intention to get this to pick out all possible errors, which, as you've rightfully pointed out, is a fools errand. But in my case it'd be a case of the perfect getting in the way of the good -- giving hints about how things relate to the standard styles are pretty useful for me and perhaps others. I was docked points on my thesis for easily preventable errors that would have been picked out easily even by the simple linter I'm building now.

So, also no list of commonly used styles + fields then? Shame, but then the lint config for bibtex will just grow one bugreport to BBT at the time.

moewew commented 6 years ago

The BibTeX consensus is described in http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf. But you can't expect every style to follow what is described there to the letter. And some styles define other fields - many newer styles define url, ...

plk commented 6 years ago

@retorquere - you can get these any time from the biber github code repo in the data/schemata folder.

plk commented 6 years ago

@retorquere - You should get the version from the dev branch currently as this hasn’t been merged for release yet.

retorquere commented 6 years ago

@moewew I agree, but a) this lint results are advisory only, and b) people should be using biblatex.

retorquere commented 6 years ago

How should I interpret <bcf:maps datatype="bibtex" level="driver">?

plk commented 6 years ago

Good point. These coerce common legacy bibtex entry types and fields into their biblatex default datamodel equivalents. They are there to handle older .bib files that exist. When creating modern biblatex-only .bib files, the targets should be used. If you are aiming to create .bib files that bibtex users can use too, then use the source names as biber will map them dynamically to the biblatex names.

retorquere commented 6 years ago

Still having some trouble figuring out how the bcf fits together.

It looks like the bcf:entryfields records specify, per type(s), which fields are allowed but optional; if there is no bcf:entrytype inside, it applies to all reference types.

Then there's the bcf:constraints, which list the constraints placed on references of the types contained in their bcf:entrytype records. Fields which are mandatory are of course by implication allowed. fieldxor fields mean only one can occur in the reference; if there's no xor, all of the contained fields are required.

Constraints that mark a field as being of a certain type (the "data"/"type" marks) get type checking but only if they're present; they must first occur in the list of allowed fields before data checking becomes relevant.

Correct?

plk commented 6 years ago

Yes, correct. There are a few comments in the .rnc file. I am happy to add some more detailed comments in there if you need them.

retorquere commented 6 years ago

I think I have it now except for the maps stuff.

BBT itself will only export the standard fields per the docs, but has a facility for custom fields and postprocessing where the user can do pretty much as they please; it is the full outcome of BBT processing + custom fields + postprocessing that I'd like to check.

But if I'm reading you right, the maps section is really only to accommodate legacy bibliographies and I should not be encouraging people to generate legacy files by "endorsing" them as valid biblatex bibliographies. Yes?

plk commented 6 years ago

Right - if your plugin is aimed at biblatex, then you can ignore the maps section anyway.

retorquere commented 6 years ago

Is bcf:entrytypes supposed to list all possible entrytypes? I don't see mvproceedings in mine.

retorquere commented 6 years ago

How should I interpret fieldor? I take it as "at least one of these", but I'd like to make sure.

plk commented 6 years ago

Strange - it is in blx-dm.def and should be in all .bcfs. Are you generating a.bcf` using an older version?

plk commented 6 years ago

fieldor is as you say. See the comments in bcf.rnc.

JohnLukeBentley commented 6 years ago

@retorquere I'm seeing mvproceedings in my bcf:entrytypes.

The latest dev version of biblatex is found at sourceforge:

(Incidentally, to all and as a general matter, I'm familiar with XSLT but not RelaxNG. On a quick glance RelaxNG, especially the compact form, looks far easier to work with ... https://en.wikipedia.org/wiki/RELAX_NG).

retorquere commented 6 years ago

I'm getting my bcf from a sharelatex compilation. If there's a better place to get the most recent one (a biber test case perhaps?) I can just use that.

JohnLukeBentley commented 6 years ago

My understanding is that you can generate a .bcf from any biblatex supporting .tex file.

See Gist-Biblatex-Mwe.tex and Gist-Biblatex-Mwe.bcf ... files I've created.

You could use Gist-Biblatex-Mwe.bcf directly and alone (if that's all you need and don't have a current latex install).

plk commented 6 years ago

There is a current test .bcf on SourcrForge in here:

https://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/testfiles/

retorquere commented 6 years ago

My crappy macbook won't tolerate a latex install.

retorquere commented 6 years ago

In that test.bcf I find

<bcf:constraints>
      ...
      <bcf:entrytype>suppperiodical</bcf:entrytype>
      ...
</bcf:constraints>

but <bcf:entrytype>suppperiodical</bcf:entrytype> is not in <bcf:entrytypes>.

retorquere commented 6 years ago

I could just infer that any <bcf:entrytype> I find is a supported entrytype, but then I don't understand what the purpose is of <bcf:entrytypes>.

plk commented 6 years ago

Corrected in biblatex and the test file.

JohnLukeBentley commented 6 years ago

@retorquere wrote:

.... Constraints that mark a field as being of a certain type (the "data"/"type" marks) get type checking but only if they're present; they must first occur in the list of allowed fields before data checking becomes relevant. ... Correct?

@plk wrote:

Yes, correct.

On the above @retorquere quoted point this seems not quite correct. In test.bcf (and also the .bcf I generate) ...

There is (so it seems) a general (entry independent) <bcf:constraints> node (the one without a <bcf:entrytype> child). And under that node a date datatype constraint node that lists origdate (and other date types).

However, under the general (entry independent) <bcf:entryfields> there is no origdate field (although there are other orig* fields).

Am I reading matters correctly that @retorquere would need, for the purpose of identifying what fields biblatex supports at a basic level (whether or not biblatex standard styles also support the field), to include all the fields under the general <bcf:constraints> node isbn, issn, .... origdate, urldate as they aren't necessarily going to listed under the general or entry specific <bcf:entryfields> nodes?

(Although I see that isbn, for example, is listed under entry specific <bcf:entryfields>).

retorquere commented 6 years ago

My bad. New build underway.

moewew commented 6 years ago

You'll have to correct me if I'm wrong, but date fields are special. They should always appear only as date (origdate, urldate, ...) in the .bib file (year and month are legacy exceptions), but they are decomposed into their date parts by Biber, the datamodel lists the date parts in \DeclareDatamodelEntryfields/<bcf:entryfields>.

The fields listed under <bcf:entryfields> without an <bcf:entrytype> (\DeclareDatamodelEntryfields without an optional argument) are supported by all types, those listed under <bcf:entryfields> with <bcf:entrytype> (\DeclareDatamodelEntryfields with optional argument) are specific to specified the type.

I think the <bcf:constraints> just give additional info about certain constraints that fields need to satisfy. In particular the type, datatype constrains of ISBN and date are just for validating the content of the fields, not whether or not they can be used in general. (ISBN is listed normally in <bcf:entryfields>, date is special as explained above.)

retorquere commented 6 years ago

So does that mean that the <bcf:constraints> block which has no <bcf:entrytype> means

if your find any of these fields, test them according to these rules

not

these fields are allowed but not required for all types, and if you find any of these fields, test them according to these rules

?

This is different from other <bcf:constraints> blocks which do imply that fields they are testing are allowed by the types they list. Yes?

I don't understand what is being said in the discussion on dates being special. The BBT linter currently only tests for field presence, so maybe that's where I'm getting lost; the discussion of dates being special seems to relate to the content of the fields, not the presence.

moewew commented 6 years ago

Again, @plk may have to correct me here, but my understanding is that <bcf:constraints>/\DeclareDatamodelConstraints always has the first interpretation of

if your find any of these fields, test them according to these rules (which may well imply that a field is required, not required, ... etc.)

regardless of whether or not a <bcf:entrytype>/optional argument is present. Whether or not a field is allowed is decided by <bcf:entryfields>/\DeclareDatamodelEntryfields alone, <bcf:constraints>/\DeclareDatamodelConstraints has no say in that matter.

You could of course argue that being present in <bcf:constraints> somewhat implies that a field is allowed, but that is not the case, strictly speaking. I just did a test with the default constraints and a minimal set of \DeclareDatamodelEntryfields, the existence of the constraints did not allow the fields to be seen as valid.

MWE The following MWE will return warnings about `title` and `journaltitle` with `biber --validate-datamodel` even though both fields are mentioned in ``/`\DeclareDatamodelConstraints` . Only if the two fields are added to ``/`\DeclareDatamodelEntryfields ` does the error go away. ``` \documentclass{article} \usepackage{filecontents} \begin{filecontents}{\jobname.dbx} \ResetDatamodelEntrytypes \ResetDatamodelFields \ResetDatamodelEntryfields \ResetDatamodelConstraints \DeclareDatamodelEntrytypes{ article, } \DeclareDatamodelFields[type=field, datatype=literal]{ abstract, annotation, booksubtitle, langid, langidopts, indextitle, journalsubtitle, journaltitle, title} \DeclareDatamodelFields[type=list, datatype=name]{ author, editor, translator} \DeclareDatamodelFields[type=list, datatype=name, label=true]{ shortauthor, shorteditor} \DeclareDatamodelFields[type=field, datatype=date, skipout]{ date, eventdate, origdate, urldate} \DeclareDatamodelFields[type=field, datatype=integer]{ number, volume,} \DeclareDatamodelFields[type=field, datatype=verbatim]{ doi} \DeclareDatamodelFields[type=field, datatype=range]{pages} \DeclareDatamodelEntryfields{ abstract, annotation, indextitle, langid, langidopts, year} \DeclareDatamodelEntryfields[article]{ addendum, author, doi, language, number, pages, volume} \DeclareDatamodelConstraints[ article]{ \constraint[type=mandatory]{ \constraintfieldsxor{ \constraintfield{date} \constraintfield{year} } } } \DeclareDatamodelConstraints[article]{ \constraint[type=mandatory]{ \constraintfield{author} \constraintfield{journaltitle} \constraintfield{title} } } \end{filecontents} \usepackage[style=authoryear, datamodel=\jobname]{biblatex} \addbibresource{biblatex-examples.bib} \begin{document} \cite{sigfridsson} \printbibliography \end{document} ```

<bcf:constraints>/\DeclareDatamodelConstraints has two purposes: (1) To check the contents of fields (date fields need to be given in ISO/EDTF format, gender has only certain values, ISBNs can be validated, ...) and (2) to check certain relations between fields (a field can be mandatory, it can be allowed only in combination with another field or to the contrary only if another field is not present)


Dates are special in that you won't find date or origdate, ... in <bcf:entryfields>/ \DeclareDatamodelEntryfields even though date fields are allowed in many places. What you'll instead find in <bcf:entryfields>/\DeclareDatamodelEntryfields are the date parts. So if date is a valid field you will find year, month, day, etc. instead (if origdate is valid, you'll see origyear, origmonth, ...). date is only found in \DeclareDatamodelConstraints. That is something your validator has to catch. Instead of the date parts listed in the .bcf actually only the full date field is valid in the .bib. I think this has to do with how Biber splits the date field and when the validation takes place with Biber.

retorquere commented 6 years ago

Whether or not a field is allowed is decided by

@moewew but wouldn't that mean that no entry type can have a date field? date is only listed in constraints, not in <bcf:entryfields>.

How did you cause the collapsible MWE in the comment you posted BTW?

moewew commented 6 years ago

This is exactly what I mean with 'date is special'. It is allowed, but instead of date we find all the date parts in <bcf:entryfields>. See also the second part of my comment above.

Collapsible code Can be obtained with the HTML tag `details`: https://github.com/dear-github/dear-github/issues/166
retorquere commented 6 years ago

Wait... so day is in <bcf:entryfields> but it's not actually a valid field, while date is not in <bcf:entryfields> but it is a valid field.

is this exclusive to fields ending in date, year, month and day?

moewew commented 6 years ago

Yup, that is the upshot. I think that date-like fields (date, urldate, origdate, ...) and their date parts are the only ones that suffer from this particular inconsistency.

Date parts are fields ending in

month,day,hour,minute,second,timezone,season,endmonth,endday,endhour,endminute,endsecond,endtimezone,endseason

So

  urlday,
  urlendday,
  urlendhour,
  urlendminute,
  urlendmonth,
  urlendsecond,  
  urlendtimezone,
  urlendyear,
  urlhour,
  urlminute,
  urlmonth,
  urlsecond,
  urltimezone,
  urlyear,

are all listed in <bcf:entryfields>, but only urldate is valid (which is indeed not listed).

There is one exception: month and year are valid fields whenever date is valid for legacy reasons (so these are doubly special). This does not hold for other dates such as urldate, origdate, eventdate, there only the date field is valid, not its date parts.

retorquere commented 6 years ago

This relates to all bcf:fields that have datatype="datepart", yes?

moewew commented 6 years ago

Yes, the 'rule' is that only the date field with the appropriate prefix is supported in the .bib file. (Except for month and year.)

retorquere commented 6 years ago

My script currently extracts this regex from the BCF to detect datepart fields:

/^(event|orig|url)?(endyear|year|month|day|hour|minute|second|timezone|season|endmonth|endday|endhour|endminute|endsecond|endtimezone|endseason)$/

and yields these as valid date fields:

moewew commented 6 years ago

That should be correct.

plk commented 6 years ago

I wouldn't use regexps though - it's probably better to parse the XML properly.

retorquere commented 6 years ago

I do parse the XML using an XML parser/xpath -- it's just that I peek into the contents of the <bcf:entryfield> and I use that regex to decide whether I should replace it with <first capture group>date; so if I find eventendyear I know that corresponds with eventdate.

I'm not parsing XML with regexen 😆 that way madness lies.

retorquere commented 6 years ago

(when I say "extracts the regex from the BCF" I mean that regex was not written by hand, it is generated by inspecting bcf:fields that have datatype="datepart"/datatype="date")

plk commented 6 years ago

Right, it does - I've done it in the distant past (when XML didn't even exist and everything was SGML ...) when there were no parsers ...