retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.25k stars 287 forks source link

Capitalization: Don't caps-protect name fields #384

Closed retorquere closed 8 years ago

retorquere commented 8 years ago

@nickbart1980 says:

Protecting two-field names is unnecessary on principle; only single-field names should be enclosed in one pair of braces, as currently [2.3.3].

njbart commented 8 years ago

I can't figure out why line 599/612 should be braced; "van"/"boven" is a particle, so the lastname doesn't have a space

That’s for bibtex (not biblatex): always “van Gogh”/“van Gogh, Vincent” etc. seems much better than “Gogh”/“Gogh, Vincent van”.

619/689/761: the particle parser deems 's-/d'/al- to be a particle, so that triggers useprefix. I assume that's OK.

Not for bibtex; bibtex doesn’t have options.

625/631/641/666/677/683/700/706/712/737/743/767 has a space in the last name, so I've braced it.

That’s not needed when Last does not begin with a word starting with a lowercase letter, and there is a First after the comma. (In other words, brace a Last that starts with uppercase only if there is no First.)

647/722: the particle parser deems "de la"/"da" a particle, so the lastname doesn't have a space, so no brace

Sorry, this should be ["de la Mare"][Walter] ["da Gama"][Vasco]` on the Zotero side (both from the Chicago Manual).

657: I thought you preferred to drop the trailing comma in this case?

I guess this shows I’m undecided here. Keeping it would flag the fact that it came from a two-field name, but I’m not sure whether that’s ever useful. Do you see any arguments for or against?

749: I thought you wanted non dropping particles in front? "zero width space" after the apostrophe

Sorry, these slipped through. For bibtex, use {d’\relax Este, Beatrice} and {de' Medici, Lorenzo}. For biblatex, please use {d’ Este, Beatrice} and {de’ Medici, Lorenzo}.

Note that “Lorenzo de’ Medici” is the only name I’ve encountered so far that needs a space after a particle ending with punctuation, so for biblatex (not bibtex) the space would have to be protected somehow (or rather, the punctuation masked). {de’\mbox{} Medici, Lorenzo} seems to work for biblatex.

That’s also why I have been experimenting with "zero width space" (and inadvertently left it in). Please either remove the ZWS, or, if you feel that’s not too hackish, you could leave it in, and map it to \mbox{}.

retorquere commented 8 years ago

I can't figure out why line 599/612 should be braced; "van"/"boven" is a particle, so the lastname doesn't have a space

That’s for bibtex (not biblatex): always “van Gogh”/“van Gogh, Vincent” etc. seems much better than “Gogh”/“Gogh, Vincent van”.

619/689/761: the particle parser deems 's-/d'/al- to be a particle, so that triggers useprefix. I assume that's OK.

Not for bibtex; bibtex doesn’t have options.

I've split the test cases into BibTeX and BibLaTeX export; tests results for BibLaTeX are here ; feel free to edit the expected output where you think mismatches are not actually errors. There are BibTeX tests in place, but they won't run until the BibLaTeX ones pass first.

625/631/641/666/677/683/700/706/712/737/743/767 has a space in the last name, so I've braced it.

That’s not needed when Last does not begin with a word starting with a lowercase letter, and there is a First after the comma. (In other words, brace a Last that starts with uppercase only if there is no First.)

But last names will never start with a word with a lowercase letter, since that word will be deemed a particle.

647/722: the particle parser deems "de la"/"da" a particle, so the lastname doesn't have a space, so no brace

Sorry, this should be ["de la Mare"][Walter] ["da Gama"][Vasco]` on the Zotero side (both from the Chicago Manual).

OK, I've adjusted those.

657: I thought you preferred to drop the trailing comma in this case?

I guess this shows I’m undecided here. Keeping it would flag the fact that it came from a two-field name, but I’m not sure whether that’s ever useful. Do you see any arguments for or against?

I personally like the flagging function if it doesn't do harm otherwise

749: I thought you wanted non dropping particles in front? ... "zero width space" after the apostrophe

Sorry, these slipped through. For bibtex, use {d’\relax Este, Beatrice} and {de' Medici, Lorenzo}. For biblatex, please use {d’ Este, Beatrice} and {de’ Medici, Lorenzo}.

Note that “Lorenzo de’ Medici” is the only name I’ve encountered so far that needs a space after a particle ending with punctuation, so for biblatex (not bibtex) the space would have to be protected somehow (or rather, the punctuation masked). {de’\mbox{} Medici, Lorenzo} seems to work for biblatex.

But how should I decide which way to go, algorithmically? It seems a little weird to hard-code a rule specifically for de' Medici.

That’s also why I have been experimenting with "zero width space" (and inadvertently left it in). Please either remove the ZWS, or, if you feel that’s not too hackish, you could leave it in, and map it to \mbox{}.

I'll see whether I can get it to work, but I wouldn't want to consider it a blocking problem for #384. I've seen other places recommend \hspace{0pt} BTW.

njbart commented 8 years ago

But last names will never start with a word with a lowercase letter, since that word will be deemed a particle.

Last names sometimes do: {{de la Mare}, Walter}, {{van Gulik}, Robert}, {{da Gama}, Vasco} etc. are cases where strings that might appear to be particles are not, but should be parsed as fixed parts of the last name, that’s why they are braced.

I’ve edited the expected biblatex output.

I wouldn't want to consider it a blocking problem for #384.

Fine with me. – \hspace{0pt} BTW does not seem to protect/mask the apostrophe; biblatex’s \addspace OTOH works.

retorquere commented 8 years ago

But those names are already handled correctly. It's names like [De Quincey] [Thomas] that offered the mismatches. According to the rule we discussed at https://github.com/ZotPlus/zotero-better-bibtex/issues/384#issuecomment-152155662, that requires bracing, but [de Quincey] [Thomas] would not be a lastname with a space in it, but a last name Quincy (no spaces in that), with a non-dropping particle de -- so no bracing.

Just put the \addspace where you'd like it to be, I'll see if I can get it to show up, but it seems a little counter intuitive to replace a ZWS (which doesn't add spacing) with \addspace (which does).

retorquere commented 8 years ago

Is there a specific reason to favor options = {useprefix=true} over options = {useprefix}?

njbart commented 8 years ago

https://github.com/ZotPlus/zotero-better-bibtex/issues/384#issuecomment-152155662: “2. ii. If not, and it's a last name, and it contains a space, brace entire part”

Sorry about that; it would seem this should be: “2. ii. If not, and it's a last name, and it begins with a lowercase letter, brace entire part” (Spaces are irrelevant, and braces aren’t needed if the last name starts with an uppercase letter, but all last names that start with lowercase need to be braced, even if they do not contain spaces; just tested this for author = {{cummins}, e. e.}.)

njbart commented 8 years ago

Is there a specific reason to favor options = {useprefix=true} over options = {useprefix}?

Yes: I found this much easier to parse in \DeclareSourcemap if needed (see, e.g., https://github.com/ZotPlus/zotero-better-bibtex/issues/353#issuecomment-143433819).

njbart commented 8 years ago

Just put the \addspace where you'd like it to be, I'll see if I can get it to show up, but it seems a little counter intuitive to replace a ZWS (which doesn't add spacing) with \addspace (which does).

So would mapping ZWS to \mbox{} be ok for you?

retorquere commented 8 years ago

Tests are running again.

retorquere commented 8 years ago

(I've added the mapping to mbox, and the useprefix, and the lowercase-starting names)

retorquere commented 8 years ago

So technically, the ZWS is valid where it is; unless ASCII-coding is set for BibLaTeX, it won't be translated into \mbox{}, and ASCII-coding is off by default for BibLaTeX. You want me to force translation of ZWS into \mbox{} even if it is off?

njbart commented 8 years ago

Just tested this, and a ZWS seems to protect/mask the apostrophe in biblatex, too. So if you’d just let any ZWSs through as is when ASCII-coding is off, that should work nicely for the time being.

retorquere commented 8 years ago

Can you add in the ZWS into the biblatex file? I don't know how to.

njbart commented 8 years ago

Done.

retorquere commented 8 years ago

Aww yeah, BibLaTeX passes, on to BibTeX. It's a simple copy of the BibLaTeX file, so there will be loads of misses.

retorquere commented 8 years ago

BibTeX also shouldn't have caps preservation, right? Let me turn that off first.

retorquere commented 8 years ago

Tests are running at https://travis-ci.org/ZotPlus/zotero-better-bibtex/jobs/88994655

retorquere commented 8 years ago

I know there are quite a few boneheaded errors in there -- don't worry about them, just make the BibTeX file look good, and I'll get to work on making the output match.

njbart commented 8 years ago

I’ve been testing a few things by processing the output with bibtex, and I have to say: what a mess. bibtex seems to ignore hyphens after prefixes, and sometimes complete multi-part prefixes. Also, I haven’t found a single bst style that demotes prefixes (have you?), so I’m not sure why we should bother parsing out prefixes more or less cleanly in the first place. I can have a further look, but it’ll probably take a while.

retorquere commented 8 years ago

I don't use BibTeX myself, only BibLaTeX, so I have very little experience here. It would be trivial to just do {{<dropping-particle> <non-dropping-particle> <family>}, <suffix>, <given>} for BibTeX, or even just {{Zotero Lastname}, {Zotero Firstname}, if that by and large does the right thing. The 2nd form is pretty much what BBT did before I started parsing particles.

njbart commented 8 years ago

{{<dropping-particle> <non-dropping-particle> <family>}, <suffix>, <given>} sounds ok to me.

retorquere commented 8 years ago

Tests are running on that here, you know the drill

retorquere commented 8 years ago

Make that here, the previous version didn't handle quoted names properly.

retorquere commented 8 years ago

All tests pass.