retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.28k stars 284 forks source link

Name particle parser always makes "van den" non-dropping #348

Closed retorquere closed 9 years ago

retorquere commented 9 years ago

Update the CSL particle parser when it comes out this weekend.

retorquere commented 9 years ago

Discussion at https://bitbucket.org/fbennett/citeproc-js/issues/183/particle-parser-returning-non-dropping

retorquere commented 9 years ago

Temporary workaround at http://tempsend.com/B9A7037B24

gracile-fr commented 9 years ago

FWIW, https://forums.zotero.org/discussion/51783

mdlincoln commented 9 years ago

Noted - thanks :+1:

retorquere commented 9 years ago

@nickbart1980, @gracile-fr: including the new particle parser causes these changes: https://travis-ci.org/ZotPlus/zotero-better-bibtex/builds/81577143 (see red results). Are these good as-is, or should I massage the results of the particle parser further?

retorquere commented 9 years ago

The test marked @bulk fails for an unrelated reason -- there's a bug in the particle parser, which will likely soon be fixed.

retorquere commented 9 years ago

I've patched around the @bulk problem temporarily, the changes can now be found at https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530

njbart commented 9 years ago

Without comparing this with the original data (Zotero, or possibly CSL JSON), I'm afraid I can't say much.

retorquere commented 9 years ago

https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580527#L570 has source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/underscores%20in%20URL%20fields%20should%20not%20be%20escaped%20%23104.json

https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530#L533 and following have source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/Big%20whopping%20library.json

https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/81580530#L677 and following have source https://github.com/ZotPlus/zotero-better-bibtex/blob/master/test/fixtures/export/(non-)dropping%20particle%20handling%20%23313.json

gracile-fr commented 9 years ago

I'm confused because many changes are due to brackets that I supposed necessary to add in order to distinguish dropping and non-dropping in BibLaTeX, see https://github.com/ZotPlus/zotero-better-bibtex/issues/313#issuecomment-133044478 But @nickbart1980 how does BibLaTex distinguish dropping, non-dropping, no-particle ?

For the other errors, well… "American Rights at Work" should be an institutional author, no? Is "Wøller, Sune Brø ndum" a real example from a real name ? WRT the abbot case, see my previous comment here.

njbart commented 9 years ago

Again, essentially you need braces only if an initial lowercase element of a fixed family name needs to be protected, i.e., those cases where the content of a Zotero family field is enclosed in double quotes:

[van Gogh] [Vincent, Jr.]

author = {van Gogh, Jr., Vincent},
options = {useprefix=true},

[Humboldt] [Alexander von, Sr.]

author = {von Humboldt, Sr., Alexander},

["von Braun"] [Wernher] (Americanised, 'von’ is a non-particle, in other words a fixed part of the family name) →

author = {{von Braun}, Wernher},

In particular, you never need braces around elements that consist of capitalised strings only.

Hence names like author = {De Castro, Eduardo Viveiros},, author = {Van Lente, Harro},, author = {De Laat, Bastian}, are perfectly ok, both in bibtex and biblatex.

gracile-fr commented 9 years ago

Ok, sorry, too many threads to follow on this question, I didn't pay enough attention to your other post. Thanks. Following that, @retorquere I think adding useprefix per-entry is required. Then we can adjust the tests.

retorquere commented 9 years ago

@gracile-fr re: "American Rights at Work", I agree, but that's not how it's encoded in the reference source. Given that source, I think {at Work, American Rights} is not less reasonable than {Work, American Rights at}.

I don't know whether Wøller is "real", but it's from a bibliography I got handed as a test case; it isn't a synthetic sample. The source is "firstName": "Sune Brø ndum", "lastName": "Wøller" which translates to {ndum Wøller, Sune Brø} now, used to be {Wøller, Sune Brø ndum}.

I'm going to look at per-entry useprefix in #353. The current issue is just about the new particle parser.

@nickbart1980, @gracile-fr, I'm trying to see the algorithm over those samples. Right now it looks like:

{ "family": "van Gogh", "given": "Vincent, Jr." } => { "family": "Gogh", "given": "Vincent", "non-dropping-particle": "van", "suffix": "Jr." }

author = {<dropping particle> <non-dropping-particle> <family>, <suffix>, <given>}, options = {useprefix=true} (because non-dropping particle present)

{ "family": "Humboldt", "given": "Alexander von, Sr." } => { "family": "Humboldt", "given": "Alexander von", "suffix": "Sr." }

I think this is a parser error wrt the 'von', so I'm dererring judgement on this one; issue has been lodged at citeproc-js

{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "\"von Braun\"", "given": "Wernher" }

author = {<dropping particle> <non-dropping-particle> {<family>}, <suffix>, <given>}, braces because of the quotes around the family name; no useprefix because no non-dropping particle

right? Non-dropping particles cause "useprefix", dropping particles don't, only quoted names cause braces?

njbart commented 9 years ago

I think this is a parser error wrt the 'von' …

No, it’s an error of the current citeproc-js parser whenever there’s also a suffix; I’ve seen this, too. It should of course be: { "family": "Humboldt", "given": "Alexander von, Sr." } => { "family": "Humboldt", "given": "Alexander", "dropping-particle": "von", "suffix": "Sr." }

{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "\"von Braun\"", "given": "Wernher" }

No quotes must be used in the output when converting from Zotero to CSL JSON (or “Pandoc JSON”), (since a CSL JSON family field will not and must not be parsed again):

{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }

njbart commented 9 years ago

Wøller, Sune Brø ndum

That’s a typo, it’s “Wøller, Sune Brøndum”, see http://www.headnet.dk/team/sune-brondum-woller/. I’ve never heard of a particle “ndum”.

“American Rights at Work”

That’s an organisation, https://en.wikipedia.org/wiki/American_Rights_at_Work, so its name must always be rendered literally, as “American Rights at Work”.

retorquere commented 9 years ago

I haven't heard of "ndum" either, but my interest is whether the output is sensible given the input. Whether the input is sensible doesn't matter in this case; garbage in, garbage out. Same goes for the "American Rights at Work"; it was entered by the user as a lastname + firstname rather than a single-field name. If I change that field to single-field mode, it is returned as {{American Rights at Work}}, but that's not what I was handed.

njbart commented 9 years ago

right? Non-dropping particles cause "useprefix", dropping particles don't …

Yes.

only quoted names cause braces?

When converting from Zotero to bib(la)tex, yes.

retorquere commented 9 years ago

(that output isn't CSL JSON, it's the output from the particle parser)

njbart commented 9 years ago

(that output isn't CSL JSON, it's the output from the particle parser)

I’m confused: which output?

retorquere commented 9 years ago

{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }

njbart commented 9 years ago

{ "family": "\"von Braun\"", "given": "Wernher" } => { "family": "von Braun", "given": "Wernher" }

Ok, I'm not familiar with any of the internals; as a mapping from Zotero to CSL JSON = Pandoc JSON, this is correct.

retorquere commented 9 years ago

So should

  author = {de La Fontaine, Jean}
  options={useprefix=true}

be preferred over

  author = {de {La Fontaine}, Jean}

? @gracile-fr, @nickbart1980?

retorquere commented 9 years ago

(given input [La Fontaine] [Jean de])

njbart commented 9 years ago
author = {de La Fontaine, Jean}
options = {useprefix=true}

This would be correct if you had a made up name (Zotero): [de La Fontaine] [Jean]

But if you're looking at the real French writer, the “de” is dropping (and the “La” is a non-particle):

Zotero: [La Fontaine] [Jean de]

bib(la)tex: author = {de La Fontaine, Jean}

The additional braces are not required in either of the forms.

(Rule of thumb, you rarely if ever need braces for bib(la)tex, in particular if you use the von Last, Jr., First form.* The biblatex-examples.bib file, e.g., does not have a single brace in creators’ names (except for accented chars).)

(* In First von Last, you’d have to protect multipart last names if there’s no von part.)

retorquere commented 9 years ago

That would fit my algorithm; https://zotplus.github.io/better-bibtex/nameparser.html?bracketed=%5BLa%20Fontaine%5D%20%5BJean%20de%5D&fudge=true returns "de" as a dropping particle, so no "useprefix".

retorquere commented 9 years ago

It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name. You know more about this than me, but I'd like to cross-check with @gracile-fr .

retorquere commented 9 years ago

And [in 't Horvath] [Peter A.C.] would result in

author = {in 't Horvath, Peter A.C},
options = {useprefix=true}

rather than

author = {{in 't Horvath}, Peter A.C}

? (the in 't is marked a non-dropping particle)

njbart commented 9 years ago

It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name. You know more about this than me, but I'd like to cross-check with @gracile-fr.

For bibtex binding non-dropping-particles to the last name might make sense (I’m not entirely sure though), but for biblatex it seems we agreed on using useprefix on a per-entry basis instead.

njbart commented 9 years ago

[in ’t Horvath] [Peter A. C.] => author = {in 't Horvath, Peter A. C.}, options = {useprefix=true}

EDIT: that seems ok.

Except we always need spaces between initials …

retorquere commented 9 years ago

I have tests running on a change that does bracing for BibTeX, useprefix for BibLaTeX. For spaces between initials, please file a new issue; it isn't related to the particle parser.

retorquere commented 9 years ago

Tests are looking pretty good. This would also close #353 when done.

retorquere commented 9 years ago

WRT https://github.com/ZotPlus/zotero-better-bibtex/issues/348#issuecomment-143242156 ; the question was not whether to do per-entry useprefix; I don't know what actual benefits it has, as in an entry without a particle in the name it should be a no-op, but tests on this are running and looking good.

The question was rather whether we could drop the bracing for non-dropping particles, which is something @gracile-fr requested earlier. There are names that have both dropping and non-dropping particles, and in the previous behaviour you'd get {dp {ndp lastname}, firstname}; now you'd get {dp ndp lastname, firstname}.

I have no grounded opinion on the matter, as I have zero clue what the proper behaviour is; I have to rely on input from you and @gracile-fr (or anyone else) to guide this. I have no reason to doubt your insights on the matter, but unless I previously misunderstood @gracile-fr (entirely possible), this change conflicts with his/her (?) earlier request. Which is why I'm pushing for a discussion; this is close to release, and is blocking the release of another fix, so the sooner settled the better.

njbart commented 9 years ago

There are names that have both dropping and non-dropping particles …

No, there aren’t. We had a lengthy discussion on the Zotero forums, e.g., here, and no one ever came up with a real-life example of a name with both dropping and non-dropping particles. “Jean de La Fontaine” previously had been used as an example for ndp+dp, but that’s not actually true; it has one dropping particle, “de”, but the “La” is a non-particle.

retorquere commented 9 years ago

Ah. OK. The particle parser did return such names previously; if it does not now, that problem seems solved then. So something like this synthetic case cannot occur then?

njbart commented 9 years ago

… whether to do per-entry useprefix …

Possible misunderstanding? I never meant to say that entries without particles should have useprefix set either way.

retorquere commented 9 years ago

Then I am thoroughly confused. Currently, if you tick "useprefix" in the preferences, each BibLaTeX entry blindly gets options={useprefix}. What change would you like to that behaviour?

njbart commented 9 years ago

… this synthetic case …

Pretty unlikely. I’d remove this case for the time being.

njbart commented 9 years ago

Then I am thoroughly confused. Currently, if you tick "useprefix" in the preferences, each BibLaTeX entry blindly gets options={useprefix}. What change would you like to that behaviour?

No, don’t do that: There shouldn’t be a box to tick "useprefix" in the preferences in the first place: [EDIT] I don’t see any sense offering this as a global option. [/EDIT] Rather, for each individual entry, we look at whether a non-dropping particle is present, in which case we set options={useprefix=true}. For all others, with dropping particles or no particles at all, we do not need to set anything (default being useprefix=false).

retorquere commented 9 years ago

That's the new behavior on which tests are running, but what harm would there be in always setting options={useprefix=true}? References which don't have particles wouldn't be affected, right?

njbart commented 9 years ago

References which don't have particles wouldn't be affected, right?

No, but it’d just be more clutter …

retorquere commented 9 years ago

Ah, OK. I can grok that.

There really isn't a case where a user would want to suppress the 'useprefix'?

retorquere commented 9 years ago

WRT clutter, is there a difference between useprefix and useprefix=true? I prefer the former if it's semantically the same.

njbart commented 9 years ago

There really isn't a case where a user would want to suppress the 'useprefix'?

Not really, no. All “van Gogh”s in in-text citations would become “Gogh”s, and that’s not usually done. Also, if you really wanted to do that, you could still let biber’s preprocessing strip out the useprefixs.

Is there a difference between useprefix and useprefix=true?

No.

retorquere commented 9 years ago

So this may be another synthetic case, but what to do with names that are given in two-field format, but only the lastname has been given? Should I output {{lastname}} (treating it like a one-field name, {lastname,}, or {lastname}

retorquere commented 9 years ago

(I have those in the testset I take from the citeproc-js testset)

njbart commented 9 years ago

I’d say {lastname}. For actual one-field names I’d prefer {{lastname}} even if there's one word only, just to preserve a hint about the original status.

retorquere commented 9 years ago

The reason why I'm wondering is that BibTeX might interpret it as a firstname lastname name. I don't know examples offhand, but It isn't necessarily illegitimate, either, e.g. something like [Aristoteles] []

njbart commented 9 years ago

That’s not a problem; bib(la)tex will never leave a lastname empty (unless the whole field is empty, that is). See http://tug.ctan.org/info/bibtex/tamethebeast/ttb_en.pdf, p. 23.

gracile-fr commented 9 years ago
It's just that I think @gracile-fr recommended quite specifically to use braces to bind non-dropping-particles to the last name.
For bibtex binding non-dropping-particles to the last name might make sense (I’m not entirely sure though), but for biblatex it seems we agreed on using useprefix on a per-entry basis instead.
The question was rather whether we could drop the bracing for non-dropping particles […] There are names that have both dropping and non-dropping particles

@nickbart1980 is right. At the time I asked for braces around non-dropping particles, I was very new to BibLaTeX (I'm still actually) and overlooked the useprefix option. (and I really don't know for BibTeX.)  

what harm would there be in always setting ```options={useprefix=true}```? References which don't have particles wouldn't be affected, right?
No, but it’d just be more clutter …

I'm confused. You're not talking about names with dropping-particles, right? Zotero: [La Fontaine] [Jean de] => bib(la)tex: author = {de La Fontaine, Jean} without the useprefix option.

retorquere commented 9 years ago

Correct.