plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
515 stars 118 forks source link

Why is origlanguage a field while language is a list? #594

Closed equaeghe closed 7 years ago

equaeghe commented 7 years ago

In the manual, I see that origlanguage is a field, whereas language is a list. Is this also the case in the biblatex code? If so, is there a reason for this?

(I am just curious, but do not have a practical reason for asking. So feel free to treat this as low priority.)

plk commented 7 years ago

No particular reason, nobody has ever wanted it as a list. Also, language is used in a different way to origlanguage in the default styles.

moewew commented 7 years ago

The other orig... fields have the same type as their un-orig... relatives.

origlanguage, however, is used for the 'translated from by ...' string via \lbx@lfromlang and \lbx@sfromlang. In the current code it is easier to deal with a field there than with a list.

equaeghe commented 7 years ago

Thanks for the replies. The inconsistency caught my eye, but given that this is not a very common element in entries, it doesn't matter much. If nevertheless at some point the decision is made to make origlanguage a list, the existing .bib files won't need to be updated, just the style files. Regarding the latter, I guess some code could be reused from what is used for language.

Anyway, this is just me being interested in the data model for a small personal toy project—writing a biblatex sqlite schema—for which consistency would be ‘nice’. (Thanks for the great software, BTW.)

moewew commented 7 years ago

It should be possible to change the type of origlanguage (at least I had an idea on how to save \lbx@lfromlang in those cases - no idea if it would have worked), but that would involve changes not only in core biblatex but also in many other custom styles. We would effectively break origlanguage for those styles unless they are updated. So I don't see the change happening any time soon. Changes in the data model that effect styles are very delicate.

It is a bit of an annoying inconsistency once you note it. I assume it was made a field because that made things a bit easier with \lbx@lfromlang.

equaeghe commented 7 years ago

I guess ‘flattening’ the list to a field biblatex-side for non-updated styles would give localization and thus consistency issues of their own.

Is there a standard procedure in which biblatex and biber deals with API-breaking changes? I guess that if you announce it now that in the future origlanguage will be made a list, it won't be as delicate anymore in, say, two-year's time.

Anyway, I'll stop meddling, you are the well-informed decision makers; thanks for taking time to look at this.

plk commented 7 years ago

To play devils advocate here, I am not sure why any of the orig* lists are lists. The whole concept of an orig* field is that it is singular and may map to multiple, for example, translations, publishers etc. in the non-orig version which should obviously be a list ...

equaeghe commented 7 years ago

@plk Were the orig fields pointers to the corresponding field/list/… in the original publication, yes. But the data model does not use pointers, but copies of the values, so it should have the same value type as the thing it is referring to. What should a .bib file creator put in origlanguage when the original work's language is a nontrivial list, i.e., has more than two languages, english and dutch, say?

In the schema I'm considering, I'd not define the orig fields, but just the concept of a related, original work with its own language, etc. Think of it like a crossreftoorig field (key?). In such a setup, origlanguage and language being of different value type is problematic.

plk commented 7 years ago

Hmm I see your point. I am happy to change this to a list with a deprecation warning about the change in advance - @moewew how about you?

moewew commented 7 years ago

I agree that it would be better to have orig fields have the same type as their unoriginal counterparts. But I'm not sure if such a 'dangerous' change is really worth it.

I can check if the solution for \lbx@lfromlang that I had in mind works out. But if we really want to make that change we need to inform package authors well in advance (CTAN authors should probably be contacted directly - at least if they are affected) and allow for sufficient time to implement the needed changes. So I would definitely want to hold back the breaking change for at least one release and would only add a warning that something is going to happen in the next release.

plk commented 7 years ago

I think we should do it but I agree that we need to give notice as you suggest.

moewew commented 7 years ago

OK.

Do you have any idea about what to do with \lbx@lfromlang and \lbx@sfromlang? I had an idea, but I'm really not entirely happy with it.

moewew commented 7 years ago

My attempt does not quite work because it fails to pick up the capitalisation before \lbx@lfromlang.

You can have a look at the attempt over at https://github.com/moewew/biblatex/commit/78dd835f91f658db45bd776bdba4befdda7a3fe2

A test doc is

\documentclass[ngerman]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[backend=biber, style=authoryear]{biblatex}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@book{maron,
  author       = {Maron, Monika},
  title        = {Animal Triste},
  date         = 2000,
  translator   = {Brigitte Goldstein},
  origlanguage = {french and dutch},
  publisher    = {University of Nebraska Press},
  location     = {Lincoln},
  langid       = {english},
  langidopts   = {variant=american},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}
\nocite{*}
\printbibliography
\end{document}

The output this gives at the moment is also sub-par if we ignore the issue of wrong capitalisation.

moewew commented 7 years ago

A short search on my system reveals that with a change to origlanguage we would at least affect

And

plk commented 7 years ago

I wonder if the last two redefine the data model to make it a list?

moewew commented 7 years ago

Not as far as I can see.

biblatex-gost still treats them as fields and has a comment % made special to simplify making them lists, but I imagine Oleg would prefer them to be lists.

biblatex-fiwi has only \DeclareListFormat{origlanguage} but no other code for origlanguage, so I assume it's an oversight.

simifilm commented 7 years ago

biblatex-fiwi has only \DeclareListFormat{origlanguage} but no other code for origlanguage, so I assume it's an oversight.

Your assumption is probably correct. I don't think that it makes any difference for biblatex-fiwi whether origlanguage is a list or a field, but I guess I just looked ad thow language is handled and went from there.

EDIT: My style assumes only one original language, so it really doesn't change anything. I also don't think that the case of multiple original languages appears often. I can think of one single example among all the books I ever encountered …

moewew commented 7 years ago

@simifilm The list declaration should have no adverse effect - in fact it should have no effect at all.

I guess it would have been to easy if you had done all the work already and we only had to steal it.

odomanov commented 7 years ago

biblatex-gost still treats them as fields and has a comment % made special to simplify making them lists, but I imagine Oleg would prefer them to be lists.

I did think about making them lists but postponed it for I don't remember what reason, most probably because this required some change in the core. I ran into 2 cases when it's needed: 1) people like Descartes or Nabokov who wrote in two languages alike, 2) thematic collections of translations from different languages. They are rare indeed but occur sometimes. In biblatex-gost there is a workaround for that, so in general lists would be convenient but it's not that critical.

plk commented 7 years ago

This may be easier that we thought since origlanguage is mostly (only, in core styles) used in the lbx@*fromlang macros in biblatex.def and any overrides in .lbx files. This is relatively contained and can be changed in core. 3rd party styles probably don't print this directly very often.

plk commented 7 years ago

I have committed a change for this making origlanguage a list. The only styles that would have to change are those which print origlanguage directly rather than print it via the lbx@*fromlang macros. Perhaps @odomanov can test 3.8 (biber 2.8 DEV is required). Test output passes regression for bibtex and biber. We just need to check if there is any easy compat code we can put in for 3rd-party styles which might need it.

plk commented 7 years ago

Hmm, I do get fromfrench and fromlatin, even though this looks a little clumsy with the default format (I assume people who did a lot of this would adapt the format):

screen shot 2017-08-12 at 6 54 pm
moewew commented 7 years ago

Yes, sorry. You are absolutely right. I got confused with my test folders.

The output might look a bit clumsy, but that is probably the best we can do without changing the way things work more drastically.

plk commented 7 years ago

Yes, I think it's fine since nobody has asked for this and anyone doing a lot of this would make their own format anyway. Just a question of notifying style authors who use this directly but I suspect most don't and just use it via .lbx strings which will pick up this change automatically.

moewew commented 7 years ago

@odomanov and @simifilm Should already be notified about this.

I have dropped three other style authors that could be affected a line on GitHub.

biblatex-chicago and biblatex-mla could also be affected.

simifilm commented 7 years ago

@odomanov and @simifilm Should already be notified about this.

I adapted my style already, so this shouldn't be a problem (I don't really support multiple original languages though. As I said earlier, I don't think that this is a scenario which happens often).

moewew commented 7 years ago

Since style authors need to know when 3.8 is released so they can send their styles off to CTAN, is there a way to announce the release to them.

I noticed that the last few versions were not even announced on ctan-ann. That is a channel of communication that should definitely be used again.

plk commented 7 years ago

Hmm, strange, I announced it as usual as I was asked to to the maintainers but then was asked to simply upload via the usual CTAN channels.

moewew commented 7 years ago

Mhhh, the last announcements listed on the CTAN page are from 2011 https://www.ctan.org/pkg/biblatex. And I haven't seen it come along comp.text.tex or https://lists.dante.de/pipermail/ctan-ann/ either. A quick search on https://lists.dante.de/pipermail/ctan-ann/ suggests that at least versions >= 3.0 have not been announced, while other custom styles, amongst them biblatex-apa, were.

plk commented 7 years ago

That's bizarre. There were special arrangements for biblatex and biber releases with Karl Berry etc. before due to the way biber was built but then I think I need to make sure that this happens in future.

odomanov commented 7 years ago
  1. I adapted the style, it seems to work fine, although needs some tweaking (see N.4 below). Which I honestly don't know how to do.

  2. I do print origlanguage directly and I need to. When you cite, for example, an article from a collection of translations you have something like:

Article, trans. from English. In: Book, trans. from English and German....

So you need to print the bibstring twice for different sets of languages. For that I actually replace \lbx@sfromlang by directly printing command just before calling \bibstring{translator}. Perhaps there is a better way to do that.

  1. In English, when there is no translator, the output looks like "trans. from the English, from the German by". Probably needs be corrected.

  2. even though this looks a little clumsy with the default format (I assume people who did a lot of this would adapt the format):

This might be really tricky. For example, the preposition "from" in some languages depends on the language that follows.

moewew commented 7 years ago

Ad 2) Can't you just say \printlist{origlanguage} (or \printlist[sfromoriglanguage]{origlanguage}. \printlist[lfromoriglanguage]{origlanguage})? Not that different, but I'm not sure what exactly you need to do.

Ad 3) With \printlist{origlanguage} I get no by. Can you provide a MWE?

Ad 4) Yes, the whole localisation is a bit anglo/germanocentric. There are a few areas that need tweaking. But that would require a full overhaul of the localisation system.

odomanov commented 7 years ago

2) In the the same entry I have origlanguage (the language of the article) and bookoriglanguage (the language of the book/collection). I need to print the bibstring with bookoriglanguage. So I'm doing this by replacing \lbx@sfromlang.

4) Yes, the overhaul. These linguistic troubles are unlikely the task of biblatex.

plk commented 7 years ago

\lbx@*fromlang just calls \printbiblist with the appropriate format now.

odomanov commented 7 years ago

Sure. I just redefine \renewcommand*{\lbx@lfromlang}{\printlist[lfromoriglanguage]{bookoriglanguage}} (to somewhat simplify). I just wanted to say that I can't avoid calling \printlist{bookoriglanguage}. So if before I had \printfield..., now I need \printlist... (but also \iflistundef etc.). So the style needs to be adapted. This is all I was saying.

odomanov commented 7 years ago

ad 3) It seems that standard styles don't print "from \ by" if there is no translator. So you don't get this by. But if the style tries to print "from \ by" even without the translator's name (why not?) it gets the by. Probably this is too complicated to be corrected (linguistics again).

moewew commented 7 years ago

Mhhhh. As far as I can see the 'by' is only included in the translator bibstrings. I wouldn't print one of those if the translator is empty. You can of course print origlanguage if we have no translator, but that should happen via \printlist or \lbx@lfromlang, in that case there should be no 'by'.

A example that shows the undesired behaviour would be appreciated.

simifilm commented 7 years ago

It seems that at the moment something has changed with capitalisation, at least in German. I always got "Aus dem …" ('Trans. from'), now it is "aus dem …".

moewew commented 7 years ago

MWE for @simifilm's problem

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{csquotes}
\usepackage[style=authortitle-icomp]{biblatex}
\usepackage{filecontents}

\begin{filecontents}{\jobname.bib}
@book{A,
  author = {Riter, W.},
  title = {Title A},
  origlanguage = {french and greek},
  translator = {Anne Elk},
}

@book{B,
  author = {Riter, W.},
  title = {Title B},
  origlanguage = {french and latin},
  translator = {Anne Elk},
}

@book{C,
  author = {Riter, W.},
  title = {Title C},
  origlanguage = {latin},
  translator = {Anne Elk},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}
\nocite{*}
\printbibliography
\end{document}
\DeclareListFormat{fromoriglanguage}{%
  \usebibmacro{list:delim}{%
    \ifbibstring{from#1}
      {\bibxstring{from#1}}
      {\ifbibstring{lang#1}
         {\bibxstring{lang#1}}
         {#1}}}%
 \ifbibstring{from#1}
    {\bibstring{from#1}}
    {\ifbibstring{lang#1}
       {\bibstring{lang#1}}
       {#1}}%
  \usebibmacro{list:andothers}}

together with

  \def\lbx@lfromlang{%
    \iflistundef{origlanguage}
      {}
      {\printlist[fromoriglanguage]{origlanguage}\space}}%
  \def\lbx@sfromlang{%
    \iflistundef{origlanguage}
      {}
      {\printlist[fromoriglanguage]{origlanguage}\space}}%

didn't work for me.

Could it be that capitalisation is not picked up in \printbiblist? Capitalisation is picked up in \printbiblist/\DeclareListFormat.

moewew commented 7 years ago

The problem here is that \lbx@lfromlang/\lbx@sfromlang are called from within another bibstring. Within \bibstring, however, the capitalisation detection of other bibstrings is turned off with

\protected\def\blx@bibstring#1#2#3{%
  \blx@begunit
  \blx@hyphenreset
  \let\bibstring\blx@imc@bibxstring
  \let\biblstring\blx@imc@bibxlstring
  \let\bibsstring\blx@imc@bibxsstring
  \lowercase{\edef\blx@tempa{#3}}%
  \ifcsundef{#2@\blx@tempa}
    {\blx@warn@nostring\blx@tempa
     \blx@endnounit}
    {\blx@imc@ifcapital
       {#1{\MakeCapital{\csuse{#2@\blx@tempa}}}}
       {#1{\csuse{#2@\blx@tempa}}}%
     \blx@endunit}}

Not sure why it worked before, the general idea was the same I think.

odomanov commented 7 years ago

to @plk, @moewew : Interesting, this is probably too style specific. GOST tolerates (sometimes even requires?) printing "trans. from..." without specifying translators. Strange that I haven't run into this before. Anyway, I need to think about it.

moewew commented 7 years ago

@odomanov This is no problem at all. You just should not use bytranslator and friends for this. I suspect you have to roll your own strings here.

odomanov commented 7 years ago

Yes, but I probably need to do this for many languages --- most of which I don't of course know. The thing is that in Russian I don't have this problem at all. bytranslator and friends are already in plenty of .lbx files, if I invent my own strings I need to have them in other languages. This really looks like "a full overhaul of the localisation system".

plk commented 7 years ago

@moewew - Yes, it's not clear how this worked before since these macros have always been called in a bibstring which should alias to the "x" versions which don't do capitalisation ... needs a bit more investigation. I have a slightly different version which improves on the default format, giving things like "translated from the French and German" instead of "translated from the French and from the German" but that's a different thing.

plk commented 7 years ago

Please try 3.8 DEV again now - the reset to vanilla bibstring macros within bibstring macro calls is now manually overridable which is useful inside macros which occur in bibstrings. Also, the default format for the list formats used by the @fromlang macros are changed and a bit less clumsy.

odomanov commented 7 years ago

Thank you, works fine. It's definitely better.

I realized why I've never run into the problem with this "by". Because I've never had English titles with origlanguage without translator. Probably this is the "solution" --- to require that in certain languages origlanguage always be accompanied by translator. Some languages like Russian (I suspect also Polish, Croatian,...) don't need this requirement. Obviously these are languages with declension, the don't need any preposition to express "trans. by". With them however there is another problem: the translator's name should be not in Nominative but in some other case. This is what actually required in biblatex-gost. Grammar is a nightmare.

moewew commented 7 years ago

Unfortunately the new construct is ungrammatical in German

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{csquotes}
\usepackage[style=authortitle-icomp]{biblatex}
\usepackage{filecontents}

\begin{filecontents}{\jobname.bib}
@book{A,
  author = {Riter, W.},
  title = {Title A},
  origlanguage = {french and greek},
  translator = {Anne Elk},
}

@book{B,
  author = {Riter, W.},
  title = {Title B},
  origlanguage = {french and latin},
  translator = {Anne Elk},
}

@book{C,
  author = {Riter, W.},
  title = {Title C},
  origlanguage = {latin},
  translator = {Anne Elk},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}
\nocite{*}
\printbibliography
\end{document}

gives

Aus dem Französischen und Griechisch übers. von Anne Elk.

But we would need (good)

Aus dem Französischen und Griechischen übers. von Anne Elk.

or (clumsy, but acceptable)

Aus dem Französischen und aus dem Griechischen übers. von Anne Elk.

This is because in German the lang... bibstrings are in the nominative case, but the language names included in from... are in the dative case. Mixing them does not work.

simifilm commented 7 years ago

Capitalisation works fine again, thanks a lot.

plk commented 7 years ago

@moewew - I should have noticed that, sorry (I do speak a little German). I have reverted to the clumsier but explicit dative-cased version for now. I suppose we might have to think about localisation in general at some point, perhaps by adding cases strings to the currently limited "long/short" options for strings ...

moewew commented 7 years ago

We have hit the limits of the current localisation system a couple of times already. It is quite adequate for English, German and a few other languages, but as soon as it get less central European, things get more and more complicated. The order of words change, declensions need to be introduced, field contents need to be declined, ... the list goes on. Proper localisation really is hard.

Anecdotal evidence of how hard localisation can be: How to get correct grammar in the bibliography of a German document when the editor is an association?

plk commented 7 years ago

Can we close this now?