jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.14k stars 3.3k forks source link

Issues with East Asian Language Tags with Hyperlinks and Quotes in DOCX Output #9909

Closed TomBener closed 5 days ago

TomBener commented 1 week ago

As have discussed in #9817, there are issues with East Asian Language font hints, specifically:

---
title: Test eastAsia document
reference-section-title: References
csl: test.csl
references:
- author:
  - family: Ma
    given: Jinguan
    suffix: 马經觀
  container-title: 國聞周報
  id: majingguan1949
  issued: 1949
  original-title: China's tomorrow and the day after tomorrow
  page: 1-2
  publisher-place: Shanghai
  title: 中國的明天和後天
  type: article-newspaper
  doi: 10.1234/5678
- author:
  - family: Ma
    given: Jinguan
    suffix: 马經觀
  container-title: 國聞周報
  id: majingguan1949a
  issued: 1949
  original-title: China's tomorrow and the day after tomorrow
  page: 1
  publisher-place: Shanghai
  title: ‘中國的明天’和後天
  type: article-newspaper
- author:
  - family: He
    given: Fangchuan
    suffix: 何芳川
  container-title: 北京大学学报
  id: hefangchuan1998
  issue: 6
  issued: 1998
  original-title: On "Hua-Yi order"
  page: 30-45
  title: "\"华夷秩序\"论"
  type: article-journal
---

This is a test document for CJK in docx with Pandoc 3.2 later
[@majingguan1949; @majingguan1949a; @hefangchuan1998].

中文“汉字”测试。
  1. The first citation contained a DOI and was rendered as a hyperlink with test.csl, but the quotation mark was not double-width for Chinese texts
  2. The title of the third citation contains a quote at the first place, leading to the open quotation mark not inclosed with East Asian font hints.

CleanShot 2024-06-24 at 14 38 55@2x

Related XML ```xml Test eastAsia document This is a test document for CJK in docx with Pandoc 3.2 later (Ma, 1949b, 1949a; He, 1998) . 中文“汉字”测试。 References HE FANGCHUAN 何芳川 (1998) ‘华夷秩序’ 论” (On “Hua-Yi order” ). 北京大学学报 6: 30–45. MA JINGUAN 马經觀 (1949a) “’中國的明天’和後天” (China’s tomorrow and the day after tomorrow). 國聞周報: 1. MA JINGUAN 马經觀 (1949b) 中國的明天和後天 (China’s tomorrow and the day after tomorrow). 國聞周報: 1–2. ```

Please update to fix the issue. Thanks for your great work!

TomBener commented 5 days ago

Update: The East Asian language font hints has no issue with main body

---
title: Test eastAsia document
reference-section-title: References
csl: test.csl
references:
- author:
  - family: Ma
    given: Jinguan
    suffix: 马經觀
  container-title: 國聞周報
  id: majingguan1949
  issued: 1949
  original-title: China's tomorrow and the day after tomorrow
  page: 1-2
  publisher-place: Shanghai
  title: 中國的明天和後天
  type: article-newspaper
  doi: 10.1234/5678
- author:
  - family: Ma
    given: Jinguan
    suffix: 马經觀
  container-title: 國聞周報
  id: majingguan1949a
  issued: 1949
  original-title: China's tomorrow and the day after tomorrow
  page: 1
  publisher-place: Shanghai
  title: ‘中國的明天’和後天
  type: article-newspaper
- author:
  - family: He
    given: Fangchuan
    suffix: 何芳川
  container-title: 北京大学学报
  id: hefangchuan1998
  issue: 6
  issued: 1998
  original-title: On "Hua-Yi order"
  page: 30-45
  title: "\"华夷秩序\"论"
  type: article-journal
---

This is a test document for CJK in docx with Pandoc 3.2 later
[@majingguan1949; @majingguan1949a; @hefangchuan1998].

中文“汉字”测试 and "English text" test。
中文“‘汉字’”测试 and "English text" test。
中文“[汉字](https://pandoc.org/custom-writers.html)”测试and "English text" test.

The generated result in document.xml:

<w:p>
  <w:pPr>
    <w:pStyle w:val="Title" />
  </w:pPr>
  <w:r>
    <w:t xml:space="preserve">Test eastAsia document</w:t>
  </w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:pStyle w:val="FirstParagraph" />
  </w:pPr>
  <w:r>
    <w:t xml:space="preserve">This is a test document for CJK in docx with Pandoc 3.2 later</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">(Ma, 1949b, 1949a; He, 1998)</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">.</w:t>
  </w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:pStyle w:val="BodyText" />
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">中文“汉字”测试</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"> and</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">“English text”</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">test。</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">中文“</w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">‘汉字’</w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">”测试</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"> and</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">“English text”</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">test。</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">中文“</w:t>
  </w:r>
  <w:hyperlink r:id="rId20">
    <w:r>
      <w:rPr>
        <w:rStyle w:val="Hyperlink" />
        <w:rFonts w:hint="eastAsia" />
      </w:rPr>
      <w:t xml:space="preserve">汉字</w:t>
    </w:r>
  </w:hyperlink>
  <w:r>
    <w:rPr>
      <w:rFonts w:hint="eastAsia" />
    </w:rPr>
    <w:t xml:space="preserve">”测试and</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">“English text”</w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve"></w:t>
  </w:r>
  <w:r>
    <w:t xml:space="preserve">test.</w:t>
  </w:r>
</w:p>

We can notice that all text fragments with Chinese were enclosed with <w:rFonts w:hint="eastAsia" />, which worked as expected.

So the issue only happen in the bibliography generated by citeproc. Given that Chinese bibliographies generally don't need to enclose the title with quotation marks, and the hyperlink is also not preferred, so I think the issue is not a big deal. If one want to correct the Chinese quotation mark font hints in the bibliography, I believe a Lua filter can work with that.

jgm commented 5 days ago

The issue will arise (even in the main body) whenever you have a structure like: QuotationMark, Link (Chinese text), Quotation Mark because in this case the Chinese text and the quotation marks won't be in the same docx "run."

It would not arise for Link (QuotationMark, Chinese text, Quotation Mark)