Closed duncdrum closed 8 years ago
A.1/A.2: the [auth]
etc pattern follow JabRef, which does folding; in your case, it looks like you'd want to use [Auth]
, or anything from the table at https://github.com/retorquere/zotero-better-bibtex/wiki/Citation-Keys#configurable-citekey-generator, which will give you what's in the Zotero reference more or less verbatim. Folding will only be applied on the generated key of you have "force..." checked.
B.1: translators don't have access to the language setting of Juris-M, so I can't check for that, but I do have access to the reference language setting. Is there a list of languages considered "latin"?
B.2: that is correct, I currently don't check those, as the recent Juris-M compatibility currently only means "doesn't error out immediately under Juris-M"; the behavior is still very much Zotero-oriented. Issue reports such as these can change that though. I think the sentence ending B.2 is cut off?
Guessing the encoding reliably is tricky, exacerbated by the fact that Javascript unicode handling is atrociously wacky. I'd rather depend on the reference telling me.
@retorquere
[Auth]
seems to do what is needed, would it be possible to use a conditional: if the records language field is english or empty use [auth]
else use [Auth]
?[auth]
according to the already existing rules?I have the feeling this won't catch all scenarios, but should cover a lot of ground. Just to be clear the reference language setting covers the language tags for each variant field of a juris-m entry? Or just the contents of the default zotero language field?
WRT A1/2 I'd rather not change the behavior of an existing (and widely used) pattern field. But wouldn't just using [Auth]
always do what you want?
On the 2nd bullet.... would that require a toggle? In Zotero the proposed algorithm, is always going to find the (sole) version, so I'm fine with [auth]
doing that, as it doesn't in practice entail a behavior change.
Currently BBT does only one thing with the Zotero language field, which is deciding whether to TitleCase certain fields. BBT never had access to field-specific language settings, so we're free to decide how to deal with those.
Yes you are correct in the first case [Auth]
should work.
In the second case we have to keep in mind that while most people have use for romanised transcription in biblatex, not everybody is using latinised transcriptions . Maybe somebody transcribes arabic into hebrew? The toggle just declares I use the alphabet (and other things). Whereas no toggle says thank you no use for alphabet.
But in the case you describe, there would be no latin fallback, so it'd just pick the primary language. The toggle wouldn't actually change the behavior in this case.
I might have misunderstood you, what would be the default behaviour: in BBT preferences I change the citationkeyformat to [jurism]
which uses [Auth], [Title]
instead of [auth],[title]
resulting 名字-題名-2001
If I also select "force citation key to ascii" we would use the romanised transcriptions where they are present, to get hanzi_timing_2001
and [auth]
as it does now for english works?
But why add a new pattern for that if it just does [Auth][Title]
? If you're referring to the [zotero]
pattern, that just replicates the Zotero key generation pattern, but unless I'm mistaken, Juris-M uses the Zotero BibTeX translator and so would get the same citation keys. I added it because there was no way to reliably assemble that pattern from more basic BBT patterns, and I wanted to help people who wanted to migrate, but if you're migrating from Juris-M BibTeX, [zotero]
should give you exactly the same keys.
But I meant that if the process were to be, for any given author / title:
this would work without change for Zotero, as 1. and 2. will always fail (no language versions) and it will pick what it always did. Still don't see what a preference would change (and I try to be conservative about adding prefs these days)
Your process makes sense, this would be a large quality of live improvement for jurism. If there are edge cases where this doesn't work, I m sure they will make themselves heard.
Do you know how Frank uses that list to decide what is Latin?
I don't think he does.
Do you have a sample I can work from? Preferably submitted by using right-click and selecting "Report Better BibTeX error"?
@retorquere bbt-error: 9I9GTW66 In juris-m Title, Editor, Publisher, and Place all have an additional field for pinyin (zh-alac) transcription, that is not exported into any bbt or zotero format via right-click -> export. See the citation.
Yang Bojun, 楊伯峻, ed. Chunqiu Zuozhuan Zhu, 春秋左傳注. 4 vols. Revised Edition. Beijing, 北京: Zhonghua shuju, 中华书局, 2000.
the citekey should be something to the effects of:
楊伯峻_春秋左傳注_2000
for non latin script users, or
Yang_ChunqiuZuozhuan_2000
for a romanised version
OK that was dumb on my part -- the current release strips out the multi-lang parts simply to make the tests pass 😒 . Could you try with https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.49-circle-2352.xpi -- that should leave the tests intact.
Error-ID: CHNKKKBH
has the full monty again.
I have a version that works, but it relies on specifying the preferred search order for language alternates. Detecting what values are romanized is sort of possible, e.g. by stripping everything that is not in the unicode letter class and seeing which alternate has the most letters left, but it feels a little iffy as an algorithm. The language preference order seems cleaner. What do you think? https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.49-circle-2354.xpi does this, it currently only has zh-alalc97
as a language preference.
There is also a version at https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.49-circle-2359.xpi which uses no language preference order but just cycles through each language present in the reference and picks the citation key that's the longest over those given languages. Which would you prefer?
FF46.0 OSX 10.11.4 JURISM 4.0.29.8m37
@retorquere -2354 didn't do anything the key remained __2000-4
even after unpinning and resetting cache.
-2359 on the other generates yang_bojun_chunqiu_2000
which is wonderful, and a big step up compared to the old way of generating keys. This fits my needs perfectly.
2354 only picks those languages that are explicitly specified, so in the __2000-4
case it's most likely it didn't use zh-alalc97
for the alternates -- 2354 only has that as a preference and will ignore all the rest.
2359 does the following, for each reference:
uzbek
, an uzbek
title with an en
alternate, and an uzbek creator with an zh-alalc97
alternate, it will try uzbek,zh-alalc97,en
(in no particular order)zh-alalc97
round, it will not use the en
version but the uzbek
versionwhich has the downside that if you mix and match languages, the results may sometimes be surprising. 2354 with a language pref order en,zh-alalc97,uzbek
can (when carefully tuned) in that particular case yield better results. I have no idea how common such mixing and matching is.
There is one further option -- I could cycle through these languages for each separate part of the key pattern instead of the key pattern as a whole, but that would require deeper changes in BBT. If you reckon mix-n-match is common enough to be an issue, I can give it a stab.
Errrr... thinking about it a little more, it would require a rewrite of substantial parts of the key generator, as cycling through the languages would have to know the results after any stuff like abbr
and nopunct
are applied, which I can't know when I'm picking the alternate. So not impossible, but non-trivial.
i just double-checked but all fields were either zh
, zh-alalc97
, en
I have no clue why 2354 didn't pick it up.
As for mix n match I m sure its pretty common, but what ultimately decides the most convenient citekey is the language or input method of the tex document. Jurism covers this via language settings -> UI locale. I would argue that for now even funky citekeys are better and more descriptive then __YYYY-(how ever many sources ones library has from that year)
.
I d say quick n dirty does the trick for now, until we can actually use zotero preferences.
2354 didn't pick up 'en', probably. I'll see what I can do with the UI locale.
New try: https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.50-circle-2365.xpi. This will just cycle through all available languages in the reference for each pattern part, with one limitation: if you have something like auth
, it will try each language for all authors at once, so if you have 3 authors in 3 different languages, some will probably be dropped. Not going to try and fix that, it would get too complex.
with 2365 I'm back to __2000-4
about multiple authors: I don't think it ll be much of a problem, as the bibliographical item, already unifies different naming conventions to fit the primary language of the publication. This is with [zotero]
.
Ah, yeah, [zotero]
is a little autistic that way -- it does exactly what the stock zotero citekey generator does, every flaw included, by design. Try [auth][year][0]
.
ahh now we have YangBojun2000
which is great.
I'm a little concerned that since [zotero]
is the default, it might be off putting to users who give bbt a first try with jurism, but if you are happy with the current iteration feel free to close the issue. I d be happy to write something for the bbt-wiki, about jurism and bbt caveats, for latex users.
That is actually a point I hadn't considered. https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.50-circle-2366.xpi should be better.
Is there documentation on what the Juris-M "language" preference pane does?
If you could verify 2366, I can merge and close this.
2366 works great even with [zotero]
. thanks for putting in the effort for us jurism users.
autogenerated cite keys are a pain when used in documents not written in english. Defaulting to
__XXXX
where "xxxx" is the year, and non-latin characters aren't processed.There are two options that relate to juris-m. A. When producing a mono-lingual document, e.g. a chinese text citing chinese sources.
名字-題名-2001
. Just take the unicode from author and title field and put it in the corresponding spots.B. Producing multi-lingual document, e.g. using CJK references in a german text.
hanzi_timing_2001
. (which could be ascii'd or not depending on force ascii setting, soé -> e, ö -> o
)I should note that i have no experience with producing primarily russian, japanse, … latex documents, but i m sure that there are more detailed requirements for working biblatex files among users from these languages. I think the trick is to use the language fields to decide what bbt should do. These aren't often used in regular zotero , but are relevant in juris-m. Guessing the script based on unicode-range is another way, but is likely to lead to trouble since script is not enough to determine language.