retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.28k stars 284 forks source link

CitationKeys are case-insensitive in bibtex #2720

Closed li-yuanrui closed 9 months ago

li-yuanrui commented 10 months ago

Debug log ID

6P6IDJVC-refs-apse/6.7.137-6

What happened?

I set the BBT citation key formula as auth.lower+year, which should be like, for example, "liu2022". When I put a Chinese paper into Zotero (the author names are Chinese), it seems BBT translates the Chinese author name into English and generate the citation key. However, the author name will be an upper case, for example, "Liu2022", which does not align with my citation key formula.

This can be a problem when there already exists a citation key "liu2022". The later coming citation key "Liu2022" (which should be "liu2022a") does not conflict with it, because BBT is case sensitive. But when I export them in to a .bib file, the BibLaTex will report errors, because BibLaTex is not case sensitive, it finds two same items.

retorquere commented 10 months ago

what happens here is the following, which is admittedly counter-intuitive for CJK:

and finally, because you have Force citation key to plain text checked, it applies pinyin, at which point it transforms the name to Liu. Some people do not want CJK names romanized, so I can't do it inside auth. I think auth.transliterate.lower or auth.pinyin.lower will do what you want.

retorquere commented 10 months ago

Are you sure biblatex is case-insensitive? In this MWE, the entry isn't rendering for me unless I uppercase the E:

\documentclass{article}
\usepackage{url}
\begin{filecontents}{\jobname.bib}
@ARTICLE{Efron1986bo,
  title    = "{Why isn't everyone a Bayesian? With discussion and a reply by the author}",
  author   = "Efron, B",
  journal  = "The American statistician",
  volume   =  40,
  number   =  1,
  pages    = "1--11",
  year     =  1986,
  issn     = "0003-1305"
}
\end{filecontents}
\usepackage[backend=biber,style=apa]{biblatex}
\addbibresource{\jobname.bib}

\begin{document}

Test see \parencite{efron1986bo}.

\printbibliography{}

\end{document}

@njbart?

li-yuanrui commented 10 months ago

what happens here is the following, which is admittedly counter-intuitive for CJK:

  • auth => 刘超 (I hope I got that right)
  • .lower => 刘超, because lower doesn't understand CJK (and I'm not sure whether "lowercase" even makes sense for CJK)

and finally, because you have Force citation key to plain text checked, it applies pinyin, at which point it transforms the name to Liu. Some people do not want CJK names romanized, so I can't do it inside auth. I think auth.transliterate.lower or auth.pinyin.lower will do what you want.

Thank you for your suggestions! The auth.transliterate.lower+year works well as what I need.

li-yuanrui commented 10 months ago

Are you sure biblatex is case-insensitive? In this MWE, the entry isn't rendering for me unless I uppercase the E:

\documentclass{article}
\usepackage{url}
\begin{filecontents}{\jobname.bib}
@ARTICLE{Efron1986bo,
  title    = "{Why isn't everyone a Bayesian? With discussion and a reply by the author}",
  author   = "Efron, B",
  journal  = "The American statistician",
  volume   =  40,
  number   =  1,
  pages    = "1--11",
  year     =  1986,
  issn     = "0003-1305"
}
\end{filecontents}
\usepackage[backend=biber,style=apa]{biblatex}
\addbibresource{\jobname.bib}

\begin{document}

Test see \parencite{efron1986bo}.

\printbibliography{}

\end{document}

@njbart?

My mistake, I confused the bibtex and biblates. In my case, my ref.bib file is as the following example:

@ARTICLE{Efron1986bo,
  title    = "{Why isn't everyone a Bayesian? With discussion and a reply by the author}",
  author   = "Efron, B",
  journal  = "The American statistician",
  volume   =  40,
  number   =  1,
  pages    = "1--11",
  year     =  1986,
  issn     = "0003-1305"
}
@ARTICLE{efron1986bo,
  title    = "{testTitle}",
  author   = "Efron, B",
  journal  = "testJournal",
  volume   =  40,
  number   =  1,
  pages    = "1--11",
  year     =  1986,
  issn     = "0003-1305"
}

and the .tex file is as follows:

\documentclass{article}
\usepackage{cite}

\begin{document}

Test see \cite{efron1986bo}.

\bibliographystyle{plain}
\bibliography{ref.bib}

\end{document}

Then I use the pdflatex->bibtex->pdflatex*2 recipe to compile, but it reports errors. If I delete the item efron1986bo from the ref.bib, it works well.

retorquere commented 9 months ago

If you try to export some items to CSV, what options are you offered in the Character Encoding dropdown?

github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.140.2720.5487 ("typo")

Install in Zotero by downloading test build 6.7.140.2720.5487, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

li-yuanrui commented 9 months ago

If you try to export some items to CSV, what options are you offered in the Character Encoding dropdown?

Hi, when exporting to CSV, my Character Encoding dropdown offers three options: "Unicode (UTF-8)", "Unicode (UTF-8 without BOM)", and "Western". The default option is "Unicode (UTF-8 without BOM)".

retorquere commented 9 months ago

can you try installing build 6487 and sens a new debug log using that?

That build will also fix the problem of duplicate keys you were seeing, but the files in 6P6IDJVC-refs-apse/6.7.137-6 were in an encoding I couldn't figure out, I've changed something about how debug logs are sent, I hope that will be fixed then as well.

li-yuanrui commented 9 months ago

can you try installing build 6487 and sens a new debug log using that?

That build will also fix the problem of duplicate keys you were seeing, but the files in 6P6IDJVC-refs-apse/6.7.137-6 were in an encoding I couldn't figure out, I've changed something about how debug logs are sent, I hope that will be fixed then as well.

Hi, I have submitted the debug log using the build test build 6.7.140.2720.5487, ID: 8ANG726J-refs-euc/6.7.140.2720.5487-6. I now use the auth.transliterate.lower + year formula. The problem of duplicate keys is fixed as I tested in auth.lower + year formula. Thank you!

retorquere commented 9 months ago

The debug log IDs sent from your system are in a character encoding other than UTF-8, and I'm trying to find out which it is, I had hoped that the encoding dropdown would show your system encoding. It looks like your Zotero is set to English, what language is your Windows installation set to?

Would you do the following for me?

The active character encoding will be displayed. For example, if it's set to UTF-8, you will see something like Active code page: 65001. But I suspect it will be a different codepage, and it would be helpful to know what it is.

retorquere commented 9 months ago

Can you also attach an RDF export of the item in 8ANG726J-refs-euc/6.7.140.2720.5487-6?

li-yuanrui commented 9 months ago

The debug log IDs sent from your system are in a character encoding other than UTF-8, and I'm trying to find out which it is, I had hoped that the encoding dropdown would show your system encoding. It looks like your Zotero is set to English, what language is your Windows installation set to?

Would you do the following for me?

  • Open the Command Prompt by pressing Win + R to open the Run dialog, type cmd, and press Enter.
  • In the Command Prompt window, type the following command and press Enter: chcp

The active character encoding will be displayed. For example, if it's set to UTF-8, you will see something like Active code page: 65001. But I suspect it will be a different codepage, and it would be helpful to know what it is.

Can you also attach an RDF export of the item in 8ANG726J-refs-euc/6.7.140.2720.5487-6?

Sure. The active code page shows 936. It is the simplified Chinese (GB2312). Yes, my Zotero is set to English.

Here I attach the Zotero RDF and Bibliontology RDF in 8ANG726J-refs-euc/6.7.140.2720.5487-6. Hope they are helpful: rdfFiles.zip .

retorquere commented 9 months ago

Can you also make an export using "BetterBibTeX JSON", once with background off, once with on?

github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.140.2720.5502 ("asciify BBT JSON")

Install in Zotero by downloading test build 6.7.140.2720.5502, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 9 months ago

Can you send a new debug log with these items from build 5502?

li-yuanrui commented 9 months ago

Can you also make an export using "BetterBibTeX JSON", once with background off, once with on?

Can you send a new debug log with these items from build 5502?

The new debug log ID is: 36SYYHWG-refs-apse/6.7.140.2720.5502-6. Please find the attached "BetterBibTeX JSON" exported files under both build 5487 and 5502: exportFiles.zip.

retorquere commented 9 months ago

Awesome, that fixes another problem I had. Thanks!