Closed retorquere closed 9 years ago
Temporary workaround at http://tempsend.com/B9A7037B24
Noted - thanks :+1:
@nickbart1980, @gracile-fr: including the new particle parser causes these changes: https://travis-ci.org/ZotPlus/zotero-better-bibtex/builds/81577143 (see red results). Are these good as-is, or should I massage the results of the particle parser further?
The test marked @bulk
fails for an unrelated reason -- there's a bug in the particle parser, which will likely soon be fixed.
I've patched around the @bulk
problem temporarily, the changes can now be found at https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530
Without comparing this with the original data (Zotero, or possibly CSL JSON), I'm afraid I can't say much.
https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580527#L570 has source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/underscores%20in%20URL%20fields%20should%20not%20be%20escaped%20%23104.json
https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530#L533 and following have source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/Big%20whopping%20library.json
https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530#L677 and following have source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/(non-)dropping%20particle%20handling%20%23313.json
I'm confused because many changes are due to brackets that I supposed necessary to add in order to distinguish dropping and non-dropping in BibLaTeX, see https://github.com/ZotPlus/zotero-better-bibtex/issues/313#issuecomment-133044478 But @nickbart1980 how does BibLaTex distinguish dropping, non-dropping, no-particle ?
For the other errors, well… "American Rights at Work" should be an institutional author, no? Is "Wøller, Sune Brø ndum" a real example from a real name ? WRT the abbot case, see my previous comment here.
Again, essentially you need braces only if an initial lowercase element of a fixed family name needs to be protected, i.e., those cases where the content of a Zotero family field is enclosed in double quotes:
[van Gogh] [Vincent, Jr.]
→
author = {van Gogh, Jr., Vincent},
options = {useprefix=true},
[Humboldt] [Alexander von, Sr.]
→
author = {von Humboldt, Sr., Alexander},
["von Braun"] [Wernher]
(Americanised, 'von’ is a non-particle, in other words a fixed part of the family name) →
author = {{von Braun}, Wernher},
In particular, you never need braces around elements that consist of capitalised strings only.
Hence names like author = {De Castro, Eduardo Viveiros},
, author = {Van Lente, Harro},
, author = {De Laat, Bastian},
are perfectly ok, both in bibtex and biblatex.
Ok, sorry, too many threads to follow on this question, I didn't pay enough attention to your other post. Thanks. Following that, @retorquere I think adding useprefix
per-entry is required. Then we can adjust the tests.
@gracile-fr re: "American Rights at Work", I agree, but that's not how it's encoded in the reference source. Given that source, I think {at Work, American Rights}
is not less reasonable than {Work, American Rights at}
.
I don't know whether Wøller is "real", but it's from a bibliography I got handed as a test case; it isn't a synthetic sample. The source is "firstName": "Sune Brø ndum", "lastName": "Wøller"
which translates to {ndum Wøller, Sune Brø}
now, used to be {Wøller, Sune Brø ndum}
.
I'm going to look at per-entry useprefix
in #353. The current issue is just about the new particle parser.
@nickbart1980, @gracile-fr, I'm trying to see the algorithm over those samples. Right now it looks like:
{ "family": "van Gogh", "given": "Vincent, Jr." } => { "family": "Gogh", "given": "Vincent", "non-dropping-particle": "van", "suffix": "Jr." }
author = {<dropping particle> <non-dropping-particle> <family>, <suffix>, <given>}
, options = {useprefix=true}
(because non-dropping particle present)
{ "family": "Humboldt", "given": "Alexander von, Sr." } => { "family": "Humboldt", "given": "Alexander von", "suffix": "Sr." }
I think this is a parser error wrt the 'von', so I'm dererring judgement on this one; issue has been lodged at citeproc-js
{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "\"von Braun\"", "given": "Wernher" }
author = {<dropping particle> <non-dropping-particle> {<family>}, <suffix>, <given>}
, braces because of the quotes around the family name; no useprefix because no non-dropping particle
right? Non-dropping particles cause "useprefix", dropping particles don't, only quoted names cause braces?
I think this is a parser error wrt the 'von' …
No, it’s an error of the current citeproc-js parser whenever there’s also a suffix; I’ve seen this, too. It should of course be:
{ "family": "Humboldt", "given": "Alexander von, Sr." } => { "family": "Humboldt", "given": "Alexander", "dropping-particle": "von", "suffix": "Sr." }
{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "\"von Braun\"", "given": "Wernher" }
No quotes must be used in the output when converting from Zotero to CSL JSON (or “Pandoc JSON”), (since a CSL JSON family field will not and must not be parsed again):
{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }
Wøller, Sune Brø ndum
That’s a typo, it’s “Wøller, Sune Brøndum”, see http://www.headnet.dk/team/sune-brondum-woller/. I’ve never heard of a particle “ndum”.
“American Rights at Work”
That’s an organisation, https://en.wikipedia.org/wiki/American_Rights_at_Work, so its name must always be rendered literally, as “American Rights at Work”.
I haven't heard of "ndum" either, but my interest is whether the output is sensible given the input. Whether the input is sensible doesn't matter in this case; garbage in, garbage out. Same goes for the "American Rights at Work"; it was entered by the user as a lastname + firstname rather than a single-field name. If I change that field to single-field mode, it is returned as {{American Rights at Work}}
, but that's not what I was handed.
right? Non-dropping particles cause "useprefix", dropping particles don't …
Yes.
only quoted names cause braces?
When converting from Zotero to bib(la)tex, yes.
(that output isn't CSL JSON, it's the output from the particle parser)
(that output isn't CSL JSON, it's the output from the particle parser)
I’m confused: which output?
{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }
{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }
Ok, I'm not familiar with any of the internals; as a mapping from Zotero to CSL JSON = Pandoc JSON, this is correct.
So should
author = {de La Fontaine, Jean}
options={useprefix=true}
be preferred over
author = {de {La Fontaine}, Jean}
? @gracile-fr, @nickbart1980?
(given input [La Fontaine] [Jean de]
)
author = {de La Fontaine, Jean} options = {useprefix=true}
This would be correct if you had a made up name (Zotero): [de La Fontaine] [Jean]
But if you're looking at the real French writer, the “de” is dropping (and the “La” is a non-particle):
Zotero: [La Fontaine] [Jean de]
bib(la)tex: author = {de La Fontaine, Jean}
The additional braces are not required in either of the forms.
(Rule of thumb, you rarely if ever need braces for bib(la)tex, in particular if you use the von Last, Jr., First
form.* The biblatex-examples.bib file, e.g., does not have a single brace in creators’ names (except for accented chars).)
(* In First von Last
, you’d have to protect multipart last names if there’s no von part.)
That would fit my algorithm; https://zotplus.github.io/better-bibtex/nameparser.html?bracketed=%5BLa%20Fontaine%5D%20%5BJean%20de%5D&fudge=true returns "de" as a dropping particle, so no "useprefix".
It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name. You know more about this than me, but I'd like to cross-check with @gracile-fr .
And [in 't Horvath] [Peter A.C.]
would result in
author = {in 't Horvath, Peter A.C},
options = {useprefix=true}
rather than
author = {{in 't Horvath}, Peter A.C}
? (the in 't
is marked a non-dropping particle)
It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name. You know more about this than me, but I'd like to cross-check with @gracile-fr.
For bibtex binding non-dropping-particles to the last name might make sense (I’m not entirely sure though), but for biblatex it seems we agreed on using useprefix on a per-entry basis instead.
[in ’t Horvath] [Peter A. C.]
=>author = {in 't Horvath, Peter A. C.}, options = {useprefix=true}
EDIT: that seems ok.
Except we always need spaces between initials …
I have tests running on a change that does bracing for BibTeX, useprefix for BibLaTeX. For spaces between initials, please file a new issue; it isn't related to the particle parser.
Tests are looking pretty good. This would also close #353 when done.
WRT https://github.com/ZotPlus/zotero-better-bibtex/issues/348#issuecomment-143242156 ; the question was not whether to do per-entry useprefix; I don't know what actual benefits it has, as in an entry without a particle in the name it should be a no-op, but tests on this are running and looking good.
The question was rather whether we could drop the bracing for non-dropping particles, which is something @gracile-fr requested earlier. There are names that have both dropping and non-dropping particles, and in the previous behaviour you'd get {dp {ndp lastname}, firstname}; now you'd get {dp ndp lastname, firstname}.
I have no grounded opinion on the matter, as I have zero clue what the proper behaviour is; I have to rely on input from you and @gracile-fr (or anyone else) to guide this. I have no reason to doubt your insights on the matter, but unless I previously misunderstood @gracile-fr (entirely possible), this change conflicts with his/her (?) earlier request. Which is why I'm pushing for a discussion; this is close to release, and is blocking the release of another fix, so the sooner settled the better.
There are names that have both dropping and non-dropping particles …
No, there aren’t. We had a lengthy discussion on the Zotero forums, e.g., here, and no one ever came up with a real-life example of a name with both dropping and non-dropping particles. “Jean de La Fontaine” previously had been used as an example for ndp+dp, but that’s not actually true; it has one dropping particle, “de”, but the “La” is a non-particle.
Ah. OK. The particle parser did return such names previously; if it does not now, that problem seems solved then. So something like this synthetic case cannot occur then?
… whether to do per-entry useprefix …
Possible misunderstanding? I never meant to say that entries without particles should have useprefix
set either way.
Then I am thoroughly confused. Currently, if you tick "useprefix" in the preferences, each BibLaTeX entry blindly gets options={useprefix}
. What change would you like to that behaviour?
… this synthetic case …
Pretty unlikely. I’d remove this case for the time being.
Then I am thoroughly confused. Currently, if you tick "useprefix" in the preferences, each BibLaTeX entry blindly gets options={useprefix}. What change would you like to that behaviour?
No, don’t do that: There shouldn’t be a box to tick "useprefix" in the preferences in the first place: [EDIT] I don’t see any sense offering this as a global option. [/EDIT] Rather, for each individual entry, we look at whether a non-dropping particle is present, in which case we set options={useprefix=true}
. For all others, with dropping particles or no particles at all, we do not need to set anything (default being useprefix=false
).
That's the new behavior on which tests are running, but what harm would there be in always setting options={useprefix=true}
? References which don't have particles wouldn't be affected, right?
References which don't have particles wouldn't be affected, right?
No, but it’d just be more clutter …
Ah, OK. I can grok that.
There really isn't a case where a user would want to suppress the 'useprefix'?
WRT clutter, is there a difference between useprefix
and useprefix=true
? I prefer the former if it's semantically the same.
There really isn't a case where a user would want to suppress the 'useprefix'?
Not really, no. All “van Gogh”s in in-text citations would become “Gogh”s, and that’s not usually done. Also, if you really wanted to do that, you could still let biber’s preprocessing strip out the useprefix
s.
Is there a difference between
useprefix
anduseprefix=true
?
No.
So this may be another synthetic case, but what to do with names that are given in two-field format, but only the lastname has been given? Should I output {{lastname}}
(treating it like a one-field name, {lastname,}
, or {lastname}
(I have those in the testset I take from the citeproc-js testset)
I’d say {lastname}
. For actual one-field names I’d prefer {{lastname}}
even if there's one word only, just to preserve a hint about the original status.
The reason why I'm wondering is that BibTeX might interpret it as a firstname lastname
name. I don't know examples offhand, but It isn't necessarily illegitimate, either, e.g. something like [Aristoteles] []
That’s not a problem; bib(la)tex will never leave a lastname empty (unless the whole field is empty, that is). See http://tug.ctan.org/info/bibtex/tamethebeast/ttb_en.pdf, p. 23.
It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name.
For bibtex binding non-dropping-particles to the last name might make sense (I’m not entirely sure though), but for biblatex it seems we agreed on using useprefix on a per-entry basis instead.
The question was rather whether we could drop the bracing for non-dropping particles […] There are names that have both dropping and non-dropping particles
@nickbart1980 is right. At the time I asked for braces around non-dropping particles, I was very new to BibLaTeX (I'm still actually) and overlooked the useprefix option. (and I really don't know for BibTeX.)
what harm would there be in always setting ```options={useprefix=true}```? References which don't have particles wouldn't be affected, right?No, but it’d just be more clutter …
I'm confused. You're not talking about names with dropping-particles, right?
Zotero: [La Fontaine] [Jean de]
=> bib(la)tex: author = {de La Fontaine, Jean}
without the useprefix option.
Correct.
Update the CSL particle parser when it comes out this weekend.