Closed ZnqbuZ closed 1 year ago
That sounds like something else than cutting though. Can you elaborate?
I noticed that babel
has babel-zh-Hant.ini, which is in fact zh-TW
, so maybe BBT can use tw
when an item's lang is zh-Hant
:robot: this is your friendly neighborhood build bot announcing test build 6.7.53.3709 ("cut tw")
Install in Zotero by downloading test build 6.7.53.3709, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
🤖 this is your friendly neighborhood build bot announcing test build 6.7.53.3709 ("cut tw")
Install in Zotero by downloading test build 6.7.53.3709, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
Expected cn
to be applied to all zh-*
except for tw
to be applied to zh-Hant
. Hard to tell if it really works this way, but I'm satisfied with the result it gives.
Oh wait you mean for the language
field. I'll have to look into that.
Let me summarize current behaviour of BBT about this language stuff, so that it may help others in future:
zh-Hans
/zh-Hans-HK
/zh-Hans-MO
/zh-Hans-SG
is filled, BBT regards them as chinese-simplified
or chinese-simplified-%region%
, where %region%
is Hong Kong SAR
/Macau SAR
/Singapore
respectively;zh-Hant
/zh-Hant-HK
/zh-Hant-MO
is filled, BBT regards them as chinese-traditional
or chinese-traditional-%region%
;zh
/zh-*
is filled in language
field, BBT regards them as chinese
(Simplified Chinese);zh-Hant
is filled, BBT uses tw
; otherwise BBT uses zh
.I'm satisfied with this behaviour.
:robot: this is your friendly neighborhood build bot announcing test build 6.7.53.3712 ("add tw")
Install in Zotero by downloading test build 6.7.53.3712, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
Great. Previously "新竹的交通大學要在2021年2月1日與台北的陽明大學合併" is cut to "新竹/的/交通/大學要/在/2/0/2/1/年/2/月/1/日/與/台北/的/陽明/大學合/併", which is wrong. (According to the demo of js-jieba it should be cut to "新竹/的/交通/大學/要/在/2/0/2/1/年/2/月/1/日/與/台北/的/陽明/大學/合併".)
Now changing the language to zh-Hant
fixes this problem.
Can you submit the sample where you set the language to zh-Hant (right-click and send a debug log)? I want to add that to my test suite to prevent regressions.
I've uploaded logs of 3 items: IZSN8DRM-refs-apse, G84S5263-refs-apse, 8E4QSEHL-refs-apse The titles of these items are string picked from the demo of js-jieba. All of them has correct segmentation only when language is set to zh-Hant
And I found that pinyin of "於" was wrong. Its pinyin is "Yu", but BBT gives "Wu" when language is set to "zh-*", while "Yu" otherwise. I've sent another log about this issue: RN8XNYVE-refs-apse. I think this is a bug of transliteration function since disabling jieba did not work, and older version (6.7.50) also have this bug.
I cannot open BBT preference by Tools -> Better BibTeX
after installing 3712 -- just nothing happened after I pressed the button. Could you reproduce this?
Edit: Even release 6.7.53 has this problem, but 6.7.50 seems good.
given the citekey formula auth.fold.lower +"_"+ veryshorttitle(2,2) +"_"+ year
, IZSN8DRM-refs-apse, G84S5263-refs-apse, 8E4QSEHL-refs-apse export to
@book{_MeizhuJinbiao_,
title = {梅竹錦標對抗賽},
langid = {chinese-traditional}
}
@book{_XiaomingBiye_,
title = {小明畢業於國立交通大學資訊科學與工程研究所},
langid = {chinese-traditional}
}
@book{_XinzhuJiaotong_,
title = {新竹的交通大學要在2021年2月1日與台北的陽明大學合併},
langid = {chinese-traditional}
}
Wait, I thought my formula was title.transliterate.capitalize
. If you change formula to this, you will see the difference of cite key between zh and zh-Hant.
I've uploaded a new log FPPGY6P5-refs-apse containing these items. Their languages are set to zh-Hant
so segmentations are correct. If change them to zh
, then jieba will give different (hence wrong) segmentations.
I cannot open BBT preference by
Tools -> Better BibTeX
after installing 3712 -- just nothing happened after I pressed the button. Could you reproduce this?Edit: Even release 6.7.53 has this problem, but 6.7.50 seems good.
Please reproduce and send a debug log from the Help menu. I cannot replicate this.
Wait, I thought my formula was
title.transliterate.capitalize
. If you change formula to this, you will see the difference of cite key between zh and zh-Hant.
Then I get
@book{FayixueTiedaoSunshangTupu,
title = {法医学铁道损伤图谱},
author = {肖, 发民},
date = {2003},
eprint = {gzA4AAAACAAJ},
eprinttype = {googlebooks},
publisher = {{郑州大学出版社}},
abstract = {本书共收集图片400余幅并附以文字说明,以铁路上常见的各种伤亡为主,内容分为:辗轧伤、撞击、拖擦伤等共9章。},
isbn = {978-7-81048-761-0},
langid = {chinese},
pagetotal = {153}
}
@book{FayixueTiedaoSunshangTupua,
title = {法医学铁道损伤图谱},
author = {肖, 发民},
date = {2013},
eprint = {gzA4AAAACAAJ},
eprinttype = {googlebooks},
publisher = {{郑州大学出版社}},
abstract = {本书共收集图片400余幅并附以文字说明,以铁路上常见的各种伤亡为主,内容分为:辗轧伤、撞击、拖擦伤等共9章。},
isbn = {978-7-81048-761-0},
langid = {chinese},
pagetotal = {153}
}
@book{GaigeLicheng,
title = {改革歷程},
author = {趙, 紫陽},
date = {2009},
eprint = {FVaOQQAACAAJ},
eprinttype = {googlebooks},
publisher = {{新世紀出版社}},
isbn = {978-988-17202-7-6},
langid = {chinese},
pagetotal = {370}
}
@book{MeizhuJinbiaoDuikangsai,
title = {梅竹錦標對抗賽},
langid = {chinese-traditional}
}
@book{XiaomingBiyeWuGuoliJiaotongDaxueZixunKexueYuGongchengYanjiusuo,
title = {小明畢業於國立交通大學資訊科學與工程研究所},
langid = {chinese-traditional}
}
@book{XinzhuDeJiaotongDaxueYaoZai2021Nian2Yue1RiYuTaibeiDeYangmingDaxueHebing,
title = {新竹的交通大學要在2021年2月1日與台北的陽明大學合併},
langid = {chinese-traditional}
}
We need to focus on one problem at a time. The conversation is getting fragmented.
Then I get
These are correct segmentations, should be same with FPPGY6P5-refs-apse. The only difference is I set authors to "0" so that they appear at the top of my library.
So the remaining issues are then:
correct?
Yes
Let's look at the prefs window first. Please enable debug logging in the Help menu, open the prefs to replicate the problem, and then send a BBT debug from the Help menu.
I submitted 2 logs: 2DHAVQ4U-apse with version 6.7.50, where the window shows, and 2NFZWHBX-apse with version build 3712, where the problem occurs
I installed Zotero and build 3712 on a fresh new virtual machine, and the problem still occurs. The debug log was submitted as 67ZW7L7V-apse
I don't see any activity indicating the prefs are opened in 67ZW7L7V-apse
and 2NFZWHBX-apse
. Is that what you're seeing? The prefs window does not open at all?
Oh wait, forget Tools->Better BibTeX, just open the Zotero prefs. I'll remove that item under the Tools menu, that's not supposed to be there yet.
Oh wait, forget Tools->Better BibTeX, just open the Zotero prefs. I'll remove that item under the Tools menu, that's not supposed to be there yet.
Ah, I see. Then the only remaining is the problem of pinyin of 於
, as you see BBT converts it to Wu
when language is set to zh
, which is wrong, but if I change language to anything else BBT will convert it correctly to Yu
. Quite strange.
:robot: this is your friendly neighborhood build bot announcing test build 6.7.53.3717 ("upgrade pinyin lib")
Install in Zotero by downloading test build 6.7.53.3717, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
Tested 3717 and the pinyin of "於" is correct now. Thank you.
Debug log ID
SDZWJFW5-refs-apse
What happened?
This is a problem of Chinese word segmentation function in cite key generation. My formula is veryshorttitle(2,2). It seems that jieba won't be applied to items with language "zh-CN", but only those with language "zh".
For example, an item with title "法医学铁道损伤图谱", whose pinyin is "FaYiXueTieDaoSunShangTuPu", will be translated to "FayixueTiedao" when language is set to "zh", while "Fayixuetiedaosunshangtupu" when language is set to "zh-CN".
However, Zotero recommends storing language as two letter ISO language codes followed by two letter ISO country codes (e.g., en-US for American English, or de-DE for German), so "zh-CN" should be the "standard" language code, instead of just "zh".
Maybe BBT should regard all languages whose code contain "zh" as Chinese.