中文参考文献顺序有误

ponypony000 commented 1 year ago

你好，我发现自动生成的中文文献顺序有误，应该是按中文作者首字母顺序排列，但我这里的第一条文献排序出错，其他文献正常。

这里第一条首字母是S，后面从C开始。

redleafnew commented 1 year ago

这个原因？ https://www.zhihu.com/question/49658413

redleafnew commented 1 year ago

还有一些多音字，如曾，也给放到了ceng。

ponypony000 commented 1 year ago

原来是这样啊，那以后还是要检查一下了

crliu95 commented 7 months ago

@redleafnew

同样遇到了这个问题。鉴于中文作者姓氏遇到多音字是较为可能的情形，作者是否有可能采用添加拼音信息域或类似做法，规定一些姓氏中常见多音字的默认读音？

不胜感谢！

您的忠实用户

zepinglee commented 7 months ago

Zotero 目前使用的 citeproc-js 是用的 JavaScript 的 String.localeCompare() 进行排序（见 https://github.com/Juris-M/citeproc-js/blob/master/src/sort.js），无法跟额外的拼音信息进行比较。如果要实现处理多音字的拼音估计得在底层进行一些修改，比较复杂。在 CSL 的层级无法处理。

crliu95 commented 7 months ago

原来如此，了解了，非常感谢！

我去Zotero Forum上开个帖子……

zepinglee commented 7 months ago

最近 citeproc-js 的作者不太活跃，估计短期内不太可能实现。

TomBener commented 6 months ago

在 MS Word 中使用 Zotero 插件可以实现参考文献按拼音排序，但在 Pandoc 或 Quarto 中使用同一份 CSL 文件，参考文献列表无法按照拼音排序，而是按照汉字的 Unicode 值排序的，请问这个问题有解决办法吗？

zepinglee commented 6 months ago

在 MS Word 中使用 Zotero 插件可以实现参考文献按拼音排序，但在 Pandoc 或 Quarto 中使用同一份 CSL 文件，参考文献列表无法按照拼音排序，而是按照汉字的 Unicode 值排序的，请问这个问题有解决办法吗？

设置了 default-locale="zh-CN" 的样式可以按照拼音排序吗？另外一部分使用了 default-locale-sort="zh-CN"，这是 CSL-M 的扩展功能。

可以考虑说服这些项目的作者支持该功能，或者自己实现发 PR。

TomBener commented 6 months ago

@zepinglee 感谢回复，设置了 default-locale="zh-CN" 也不行。感觉说服他们比较困难，毕竟这些项目开发者都不会接触到拼音，而自己又没有能力实现……

zepinglee commented 6 months ago

@zepinglee 感谢回复，设置了 default-locale="zh-CN" 也不行。感觉说服他们比较困难，毕竟这些项目开发者都不会接触到拼音，而自己又没有能力实现……

这个大致是 UCA，我自己的 zepinglee/citeproc-lua 所依赖的 michal-h21/lua-uca 目前也没有实现中文的。

TomBener commented 4 months ago

分享一个粗糙的 Python 脚本，通过处理 .docx 文件实现参考文献按照拼音正确排序，可以根据需要修改姓氏为多音字的情况：

# Sort the Chinese bibliographies in Word document based on Pinyin
# The script assumes that the bibliography starts with the keyword "参考文献"
# The script will lead to the rot of cross-references and disappear of hyperlinks

from docx import Document
from pypinyin import pinyin, Style

# Open the document
doc = Document("input.docx")
paras = doc.paragraphs

# Look for keyword to find the start of the bibliography
for i, para in enumerate(paras):
    if "参考文献" in para.text:
        biblio_start = i + 1
        break

# Separate the bibliography and the rest of the document
front_matter = paras[:biblio_start]
links = [p.text for p in paras[biblio_start:]]

# Append all the bibliography entries to name_list
name_list = [(link, link.split(' ')[0]) for link in links]

# Function to handle special cases
def special_pinyin(name):
    if name.startswith('曾'):
        return 'zeng'
    else:
        return "".join([i[0] for i in pinyin(name, style=Style.TONE3)])

# Sort name_list by turning it into a list of tuples
sorted_references = sorted(name_list, key=lambda x: special_pinyin(x[1]))

# Re-write the sorted references back into paras
for i, item in enumerate(sorted_references):
    paras[biblio_start + i].text = item[0]

doc.save('output.docx')

zotero-chinese / styles

中文参考文献顺序有误 #152