retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.27k stars 284 forks source link

Export U+01C2 as textdoublebarpipe #2896

Closed HughP closed 4 months ago

HughP commented 4 months ago

Debug log ID

Q4B9RHJW-euc/6.7.203-6

What happened?

When I export a bibliographic reference containing Unicode character “ǂ” (U+01C2) (info) it is exported as \textdoublepipe Unicode U+01C1. As I understand things, in text mode, it should be exported as \textdoublebarpipe else a user message should be given that the character is not exportable (+plus the reason). The example below shows that the described behavior occurs in the title and the abstract. The limits of my knowledge are tested with regards to the LaTeX text mode. That is, I usually use XeLaTeX and UTF-8 encoding (which is also my default encoding within Zotero), but for my current project I need to use pdfLaTeX and text mode. Text mode seems to be the default for BBT and also works with XeLaTeX just fine. The challenging thing as a user is when BBT appears to export without issue and the wrong character is exported. Because I am not familiar with LaTeX text mode, I am not sure if \textdoublebarpipe is present in the default document set up or if it expected that users will need to add \usepackage{tipa} to their preamble. I do add this package and when do and also I change the \textdoublepipe to \textdoublebarpipe things render correctly. The issue is that I shouldn't have to edit the bibTeX entry after export. Ether the original character “ǂ” (U+01C2) should be exported or \textdoublebarpipe should be used. \textdoublepipe is unacceptable because it is not the same visually or even the same character.

I found the linked LaTeX symbols reference list helpful.

Versions: Zotero 6.0.35, BBT 6.7.203 My rendering results are tested using the Overleaf platform.

Example BBT output

@incollection{jones_endangered_2020-5,
  title = {Endangered {{African Languages Featured}} in a {{Digital Collection}}: {{The Case}} of the {{{\textdoublepipe }Khomani San}} {\textbar} {{Hugh Brody Collection}}},
  booktitle = {Proceedings of the {{First}} Workshop on {{Resources}} for {{African Indigenous Languages}} ({{RAIL}}): {{Language Resources}} and {{Evaluation Conference}} ({{LREC}} 2020), {{Marseille}}, 11--16 {{May}} 2020},
  author = {Jones, Kerry and Muftic, Sanjin},
  editor = {Mabuya, Rooweither and Ramukhadi, Phathutshedzo and Setaka, Mmasibidi and Wagner, Valencia and {van Zaanen}, Menno},
  year = {2020},
  pages = {1--8},
  publisher = {European Language Resources Association (ELRA)},
  address = {Marseille, France},
  url = {https://www.aclweb.org/anthology/2020.rail-1.1},
  abstract = {The {\textdoublepipe}Khomani San {\textbar} Hugh Brody Collection features the voices and history of indigenous hunter gatherer descendants in three endangered languages namely, N{\textbar}uu, Kora and Khoekhoe as well as a regional dialect of Afrikaans. A large component of this collection is audio-visual (legacy media) recordings of interviews conducted with members of the community by Hugh Brody and his colleagues between 1997 and 2012, referring as far back as the 1800s. The Digital Library Services team at the University of Cape Town aim to showcase the collection digitally on the UCT-wide Digital Collections platform, Ibali which runs on Omeka-S. In this paper we highlight the importance of such a collection in the context of South Africa, and the ethical steps that were taken to ensure the respect of the {\textdoublepipe}Khomani San as their stories get uploaded onto a repository and become accessible to all. We will also feature some of the completed collection on Ibali and guide the reader through the organisation of the collection on the Omeka-S backend. Finally, we will outline our development process, from digitisation to repository publishing as well as present some of the challenges in data clean-up, the curation of legacy media, multi-lingual support, and site organisation.}
}

Zotero RDF of record (for ingest and local testing)

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:z="http://www.zotero.org/namespaces/export#"
 xmlns:dcterms="http://purl.org/dc/terms/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:bib="http://purl.org/net/biblio#"
 xmlns:vcard="http://nwalsh.com/rdf/vCard#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:link="http://purl.org/rss/1.0/modules/link/">
    <bib:BookSection rdf:about="https://www.aclweb.org/anthology/2020.rail-1.1">
        <z:itemType>bookSection</z:itemType>
        <dcterms:isPartOf>
            <bib:Book>
                <dc:title>Proceedings of the First workshop on Resources for African Indigenous Languages (RAIL): Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020</dc:title>
            </bib:Book>
        </dcterms:isPartOf>
        <dc:publisher>
            <foaf:Organization>
                <vcard:adr>
                    <vcard:Address>
                       <vcard:locality>Marseille, France</vcard:locality>
                    </vcard:Address>
                </vcard:adr>
                <foaf:name>European Language Resources Association (ELRA)</foaf:name>
            </foaf:Organization>
        </dc:publisher>
        <bib:authors>
            <rdf:Seq>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Jones</foaf:surname>
                        <foaf:givenName>Kerry</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Muftic</foaf:surname>
                        <foaf:givenName>Sanjin</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
            </rdf:Seq>
        </bib:authors>
        <bib:editors>
            <rdf:Seq>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Mabuya</foaf:surname>
                        <foaf:givenName>Rooweither</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Ramukhadi</foaf:surname>
                        <foaf:givenName>Phathutshedzo</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Setaka</foaf:surname>
                        <foaf:givenName>Mmasibidi</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Wagner</foaf:surname>
                        <foaf:givenName>Valencia</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>van Zaanen</foaf:surname>
                        <foaf:givenName>Menno</foaf:givenName>
                    </foaf:Person>
                </rdf:li>
            </rdf:Seq>
        </bib:editors>
        <link:link rdf:resource="#item_61421"/>
        <dc:subject>Omeka-S</dc:subject>
        <dc:subject>
           <z:AutomaticTag><rdf:value>/unread</rdf:value></z:AutomaticTag>
        </dc:subject>
        <dc:identifier>
            <dcterms:URI>
                <rdf:value>https://www.aclweb.org/anthology/2020.rail-1.1</rdf:value>
            </dcterms:URI>
        </dc:identifier>
        <bib:pages>1–8</bib:pages>
        <dc:date>2020</dc:date>
        <z:libraryCatalog>Zotero</z:libraryCatalog>
        <z:language>en</z:language>
        <dcterms:abstract>The ǂKhomani San | Hugh Brody Collection features the voices and history of indigenous hunter gatherer descendants in three endangered languages namely, N|uu, Kora and Khoekhoe as well as a regional dialect of Afrikaans. A large component of this collection is audio-visual (legacy media) recordings of interviews conducted with members of the community by Hugh Brody and his colleagues between 1997 and 2012, referring as far back as the 1800s. The Digital Library Services team at the University of Cape Town aim to showcase the collection digitally on the UCT-wide Digital Collections platform, Ibali which runs on Omeka-S. In this paper we highlight the importance of such a collection in the context of South Africa, and the ethical steps that were taken to ensure the respect of the ǂKhomani San as their stories get uploaded onto a repository and become accessible to all. We will also feature some of the completed collection on Ibali and guide the reader through the organisation of the collection on the Omeka-S backend. Finally, we will outline our development process, from digitisation to repository publishing as well as present some of the challenges in data clean-up, the curation of legacy media, multi-lingual support, and site organisation.</dcterms:abstract>
        <dc:title>Endangered African Languages Featured in a Digital Collection: The Case of the ǂKhomani San | Hugh Brody Collection</dc:title>
    </bib:BookSection>
    <z:Attachment rdf:about="#item_61421">
        <z:itemType>attachment</z:itemType>
        <dc:title>jones_muftic-2020-endangered_african_languages_featured_in_a_digital_collection_-_the_case_of_the.pdf</dc:title>
        <link:type>application/pdf</link:type>
    </z:Attachment>
</rdf:RDF>
retorquere commented 4 months ago

All that condenses down to "U+01C2 should be exported as \textdoublebarpipe" right?

HughP commented 4 months ago

Yep!

All the best, -Hugh

Sent from my iPhone

On Mon, Jun 3, 2024 at 3:20 PM Emiliano Heyns @.***> wrote:

All that condenses down to "U+01C2 should be exported as \textdoublebarpipe" right?

— Reply to this email directly, view it on GitHub https://github.com/retorquere/zotero-better-bibtex/issues/2896#issuecomment-2146224407, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ2JRQV45NU2SZ3ZFNXRLZFTT3ZAVCNFSM6AAAAABIXEIYMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBWGIZDINBQG4 . You are receiving this because you authored the thread.Message ID: @.***>

retorquere commented 4 months ago

When I export a bibliographic reference containing Unicode character “ǂ” (U+01C2) it is exported as \textdoublepipe Unicode U+01C1. As I understand things, in text mode, it should be exported as \textdoublebarpipe

That is correct, but it will only work with tipa loaded.

else a user message should be given that the character is not exportable (+plus the reason).

That is not possible. Zotero doesn't allow exporters to provide feedback to the user other than "success" or "failed".

Text mode seems to be the default for BBT

This is not precisely correct. The default for Better BibTeX is text mode (but it can be changed). The default for Better BibLaTeX is unicode.

or if it expected that users will need to add \usepackage{tipa} to their preamble.

The symbol was missing from the table, I've added it now; a build will drop here in a bit with that baked in.

In BBT, you can specify what packages you have loaded. I'll add missing packages to the quality report later, but you can see what characters need special packages and which packages provides them here.

Zotero RDF of record (for ingest and local testing)

Appreciated, but next time please send a debug log by right-clicking the entry and select "Send BBT Debug log" from the popup menu (the ID will have -refs in it). That sends me the entry in a way that my test suite can pick it up automatically, builds a named testcase from it, etc.

github-actions[bot] commented 4 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.203.2896.6361 ("update unicode mapping")

This update may name other issues, but the build just dropped here is for you; it just means problems already fixed in other issues have been folded into the work we are doing here. Install in Zotero by downloading test build 6.7.203.2896.6361, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 4 months ago

Can you test build 6361?

HughP commented 4 months ago

Not resolved.

I downloaded and installed the test build provided by the bot, and I still get \textdoublepipe. I've rebooted Zotero twice. And tried again with the export. same result. here are the link logs.

YR3BXCQP-euc/6.7.203.2896.6361-6 DPF9RCHH-refs-euc/6.7.203.2896.6361-6

retorquere commented 4 months ago

~Have you configured BBT to load the tipa package support as per those links above?~ you need to follow the instructions from here (also read the top of the page).

HughP commented 4 months ago

Thank you for highlighting those extra steps. I missed them (in the zotero settings). I can confirm that post adding tipa to the appropriate setting that the export is now working.

FYI there is a link not rendering on the documentation page at: https://retorque.re/zotero-better-bibtex/installation/preferences/hidden-preferences/index.html#packages

I do wonder though why the software inserts a \textdoublepipe for this character at all (when there is no tipa set in the settings). I mean, the two Unicode characters aren't interchangeable. It seems to be a case of see " A" return "B" when the user would be expecting an error or return "null" or return "A".

github-actions[bot] commented 4 months ago

Thanks for the feedback! Just so you know, GitHub doesn't let me control who can close issues, and @retorquere likes to leave bug reports and enhancements open as a nudge to merge them into the next release.

retorquere commented 4 months ago

Again, I can only have the export succeed or fail without further comment, and the latter, without any indication what went wrong, is entirely useless to the user. I'm going to add it to the quality report, which will be much more informative. I might also add an option to just blindly assume all required packages are available, which is the behavior you want.

retorquere commented 4 months ago

The user must also be in control of what packages are assumed. Not all packages are available everywhere, and not all packages are compatible with each other.

retorquere commented 4 months ago

It's already in the quality report