zepinglee / citeproc-lua

A Lua implementation of the Citation Style Language (CSL) for use with LaTeX
MIT License
66 stars 8 forks source link

Problem processing quotation marks? #71

Closed nickw2066 closed 3 months ago

nickw2066 commented 3 months ago

Describe the bug In the annotated bibliography below quotation marks seem to be processed fine in the shorter note, but with the slightly longer note problems arise.

Additional information

To Reproduce

% !TEX encoding = UTF-8 Unicode
\documentclass{article}

\begin{filecontents}[overwrite, noheader]{\jobname.json}
[
    {
        "id": "Sari2007-Pragma",
        "type": "thesis",
        "author": [
            {
                "family": "Sari",
                "given": "Faizah"
            }
        ],
        "title": "Pragmatic Particles: A Cross-Linguistic Discourse Analysis of Interaction",
        "title-short": "Pragmatic Particles",
        "publisher-place": "Tuscaloosa, AL",
        "publisher": "University of Alabama",
        "archive": "ProQuest Dissertations & Theses",
        "archive_location": "Publication No. 3313743",
        "genre": "doctoral dissertation",
        "note": "Sari's dissertation aims to clarify the function of seven Indonesian particles (<i>kan</i>, <i>ya</i>, <i>kok</i>, <i>lho</i>, <i>dong</i>, <i>deh</i>, and <i>sih</i>) and the English particle `yeah'. Sari argues that <i>deh</i> takes sentence-final position and is an emphatic particle which enhances \"the speaker's explanation about an idea\". Usually <i>deh</i> has neutral or falling intonation. One of the conversation chunks Sari analyses is <i>tidur deh di atas</i> which Sari translates as `(we just had to accept that) he'd sleep upstairs'.",
        "issued": {
            "date-parts": [
                [
                    "2007"
                ]
            ]
        }
    },
    {
        "id": "Sari2007-Pragma-shorterNote",
        "type": "thesis",
        "author": [
            {
                "family": "Sari",
                "given": "Faizah"
            }
        ],
        "title": "Pragmatic Particles: A Cross-Linguistic Discourse Analysis of Interaction",
        "title-short": "Pragmatic Particles",
        "publisher-place": "Tuscaloosa, AL",
        "publisher": "University of Alabama",
        "archive": "ProQuest Dissertations & Theses",
        "archive_location": "Publication No. 3313743",
        "genre": "doctoral dissertation",
        "note": "Sari's dissertation aims to clarify the function of seven Indonesian particles (<i>kan</i>, <i>ya</i>, <i>kok</i>, <i>lho</i>, <i>dong</i>, <i>deh</i>, and <i>sih</i>) and the English particle `yeah'. Sari argues that <i>deh</i> takes sentence-final position and is an emphatic particle which enhances \"the speaker's explanation about an idea\". Usually <i>deh</i> has neutral or falling intonation. One of the conversation chunks Sari analyses is <i>tidur deh di atas</i> which Sari translates as ",
        "issued": {
            "date-parts": [
                [
                    "2007"
                ]
            ]
        }
    }
]

\end{filecontents}

\usepackage{citation-style-language}
\cslsetup{style = apa-annotated-bibliography}
\addbibresource{\jobname.json}

\begin{document}

\nocite{Sari2007-Pragma-shorterNote} % note has no problem with quotation marks
\nocite{Sari2007-Pragma} % note has problem with quotation marks

\printbibliography

\end{document}

Screenshots Screenshot 2024-08-02 at 6 25 31 AM

The csl file is from: https://github.com/citation-style-language/styles/blob/master/apa-annotated-bibliography.csl

zepinglee commented 3 months ago

Thanks for reporting the bug the bug and I can reproduce it.

zepinglee commented 3 months ago

I've finally fixed this bug! It's much complicated than I initially expected because I have to rewrite the fundamental module of parsing quotation marks and pseudo HTML tags which involves tons of test cases.

Screenshot 2024-08-14 at 09 30 24

Also note that the backtick character ` is not recommend in Zotero field. Zotero's item data is not teated as LaTeX code and thus the `yeah' in your provided example is interpreted as a word with a backtick and an apostrophe rather than a quoted word. The former is then directly outputted to LaTeX and is recognized as left quote and the latter is converted to Unicode curly apostrophe by the citeproc-lua engine. I suggest using curly quotation marks (‘yeah’) or plain straight quotes ('yeah') in this case.

BTW it's safe to use LaTeX form `yeah' in .bib database because citeproc-lua can convert both punctuations to Unicode curly quotation marks.

nickw2066 commented 3 months ago

Thank you so much!

On 14 Aug 2024, at 8:57 AM, Zeping Lee @.***> wrote:

I've finally fixed this bug! It's much complicated than I initially expected because I have to rewrite the fundamental module of parsing quotation marks and pseudo HTML tags which involves tons of test cases.

Screenshot.2024-08-14.at.09.30.24.png (view on web) https://github.com/user-attachments/assets/554a4182-ca30-4955-a8d8-e30f1abba82f Also note that the backtick character is not recommend in Zotero field. Zotero's item data is not teated as LaTeX code and thus theyeah' in your provided example is interpreted as a word with a backtick and an apostrophe rather than a quoted word. The former is then directly outputted to LaTeX and is recognized as left quote and the latter is converted to Unicode curly apostrophe by the citeproc-lua engine. I suggest using curly quotation marks (‘yeah’) or plain straight quotes ('yeah') in this case.

BTW it's safe to use LaTeX form `yeah' in .bib database because citeproc-lua can convert both punctuations to Unicode curly quotation marks.

— Reply to this email directly, view it on GitHub https://github.com/zepinglee/citeproc-lua/issues/71#issuecomment-2287687369, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2MWFLBRW2JIMCWKX6EPATTZRK2PNAVCNFSM6AAAAABL3PUX4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBXGY4DOMZWHE. You are receiving this because you authored the thread.