retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.34k stars 287 forks source link

Juris-M missing multi-lingual fields #482

Closed duncdrum closed 8 years ago

duncdrum commented 8 years ago

OS X 10.11.4, FF 45.0.2, Juris-M 4.0.29.8m67, BBT1.6.48

The following item in Juris-M (for Firefox): screenshot 2016-04-21 15 45 37

is exported as

@online{leishuku,
  title = {中國類書庫},
  url = {http://server.wenzibase.com},
  shorttitle = {leishuku},
  timestamp = {2016-04-21T13:06:09Z},
  langid = {pinyin},
  titleaddon = {中國類書庫},
  type = {Full-text databse},
  author = {{愛如生}},
  urldate = {2016-04-20},
  year = {n.d.},
  file = {Google-Ergebnis für http\://crossasia.org/uploads/tx_sbbtyponewsletter/rte/RTEmagicC_erudition.png.png:/Users/HALmob/Library/Application Support/Firefox/Profiles/6wgnt11i.default/zotero/storage/ZF99V7IR/imgres.html:}
}

none of the multi-lingual fields are included. Is this the expected behavior? I use JM Chicago style (but other styles have the same effect). This is the cite item result:

Ài rú shēng, 愛如生 Erudition. “Zhōngguó lèishū kù”, 中國類書庫 (Database of Chinese Encyclopeadias). Full-text databse. 中國類書庫 Zhōngguó lèishū kù, n.d. http://server.wenzibase.com.

retorquere commented 8 years ago

There is no expected behavior yet, as this release is the first that works at all with Juris-M :)

I have no access to that item, but if you right-click it in Zotero and select "Send Better BibTeX Error", I will get a copy. If you click through that dialog, you'll get an ID, please post that here so I know which is yours. If you could attach the resulting BibTeX (or BibLaTeX, please specify which) here, I can get on that.

duncdrum commented 8 years ago

@retorquere thanks for looking into this. No time like the first time. ErrorID: NV6HSWWG

the contents. of the leishuku.bib file are in the op you can download it from here as well.

Because of UTF-8 I only work with (better)biblatex (and biber as the backend) I have no idea about bibtex.

Some initial impression of working with juris-m and bbt:

  1. autogenerated cite keys are a pain, regardless of the option to use or not use ascii for biblatex the initial keys are always __XXXX where "xxxx" is the year, and non-latin characters aren't processed. screenshot 2016-04-21 20 34 56
  2. it would be much better if bbt tried to generate a key using the transliteration, or translation information instead.
  3. from what I understand so far biblatex can take titleaddon and authoraddon information to store transliteration / translation info, however, names only work for single author records, kind of voiding the whole thing.
  4. the MWE from the above is special in that the author name field has both transliteration AND translation (it's a company). It is much more common for there to be only original and transliteration for names. (titles on the other hand often have both.)
  5. These SE threads describe common solutions for multi-lingual bibliographies in latex including cjk references.

tl;dr for bbt to play nice with juris-m transliteration and translation information should somehow be present in the exported biblatex files. Even if users have to edit the bib files to change the field names.

retorquere commented 8 years ago

Is leishuku.bib what you want it to export, or what it does export? I'm looking for an entry how you want it to export.

On the other points:

  1. The "as ASCII" setting only affects the fields, not the citekey. If you select "as ASCII" it will translate unicode characters to their LaTeX equivalent commands. What you want is "Force citation key to ASCII", which you have disabled currently. This force uses the Zotero transliteration -- it usually does OK with Latin-like languages, but Chinese (is that Chinese?) I think it doesn't do so well. If you know of any projects that transliterate such characters well I'll be happy to look into those.
  2. BBT does do that by default for most patterns, although there are some patterns that yield untransliterated keys, which is when the "Force" option is useful.
  3. Given that, what would you suggest?
  4. I'm fine with that as long as I know how you want it
  5. You wildly overestimate how well I understand BibLaTeX :grin: I don't know what I should take away from those threads.
retorquere commented 8 years ago

(the reason for the first question above my list is that leishuku.bib doesn't seem to provide a solution for points 3 and 4)

duncdrum commented 8 years ago

no leishuku.bib is what is currently exported, and yes its a chinese reference. Maybe we should open a new issue for the citation keys. Since it is a separate thing. So after having another look at the biblatex docs there is three option to work around the fact that juris-m is capable of things that biblatex just isn't. I ll use the examples from the Stackexchange threads.

author = {Li, 李无未, Wuwei} %note the very counterintuitive order and use of commas

+

\newbibmacro*{name:cjk}[3]{%
  \usebibmacro{name:delim}{#2#3#1}%
  \usebibmacro{name:hook}{#2#3#1}%
  \mkbibnamelast{#1}%
  \ifblank{#2}{}{\bibnamedelimd\mkbibnamefirst{#2}}%
  \ifblank{#3}{}{\bibnamedelimd\mkbibnameaffix{#3}}}
Author = {{Li Wuwei}}, %Author as institution not individual this info is would be pulled from juris-m transliteration field
nameaddon = {李无未}, % this would be the original author in juris-m
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="biblatexml.rng"
            type="application/xml"
            schematypens="http://relaxng.org/ns/structure/1.0"?>
<bltx:entries xmlns:bltx="http://biblatex-biber.sourceforge.net/biblatexml">
  <bltx:entry id="key1" entrytype="book">
    <bltx:names type="author" morenames="1" useprefix="true">
      <bltx:name part xml:lang="zh" type="family">李</bltx:namepart>
      <bltx:name part xml:lang="zh" type="given">无未</bltx:namepart>
      <bltx:name part xml:lang="zh-alac97" type="family">Lǐ</bltx:namepart>
      <bltx:name part xml:lang="zh-alac97" type="given">Wúwèi</bltx:namepart>
    </bltx:name>
  </bltx:entry>
</bltx:entries>

I also found this version, so I ll need to do some testing

<?xml version="1.0" encoding="UTF-8"?>
<bib:entries xmlns:bib="http://biblatex-biber.sourceforge.net/biblatexml">
  <bib:entry id="key1" entrytype="collection">
    <bib:editor>
      <bib:person gender="sm">李无未</bib:person>
    </bib:editor>
    <bib:editor mode="romanised">
      <bib:person>
        <bib:first>
          <bib:namepart initial="Ww"> Wúwèi </bib:namepart>
        </bib:first>
        <bib:last>Lǐ</bib:last>
      </bib:person>
    </bib:editor>
  </bib:entry>
</bib:entries>

I'm quite swamped atm, but i ll try to upload a working .bib examples for bilatex and biblatexml, I don't think monkey-patch is the way to go here. Let me know what you think.

retorquere commented 8 years ago

Option 2 should be easy. Don't really fancy option 1. Option 3 is obviously desirable, but it will be more work (and the next few weeks I'll be swamped), and for the life of me I can't find documentation on what biblatexml is supposed to look like.

duncdrum commented 8 years ago

yes I contacted the biblatex and biber devs, waiting to hear back. It seems biblatexml is in pre-documentation beta. There seems to have been a multi-script branch of biblatex that is already in public but i can't find documentation for it either. I ll just export and process some items and see what the logs have to say.

retorquere commented 8 years ago

Holy ... biblatexml is a schizo format. Parts XML, parts LaTeX. Why didn't they just settle on CSL-JSON?!

duncdrum commented 8 years ago

mark: leishuku:1

Ok after a bunch of testing, Biblatex just can't do it. There is a discussion thread for biblatex here and one from july 2015 on the zotero forums.

The only thing that seems both safe to use and not mess things up, is titleaddon. Names, Publishers, Places etc. are beyond reach. Currently titleaddon uses the same data as title, which doesn't make much sense [see op]. Instead BBT could do the following:

  1. If there is only one title: use current behaviour, but drop the superfluous titleaddon.
  2. If there is a title which has a variant from the same language family or has only one titlevariant: put it in titleaddon
  3. If there is more then one variant or variants in other languages user userz to usera.

so for the example in the op error ID: PMNPC378

@online{leishuku,
  title = {中國類書庫}, % this is "zh" in jurism
  titleaddon={Zhōngguó lèishū kù}, % this is "zh-alalc97" in jurism
  %userd={Datenbank der chinesischen Encyclopädien}, this would be further variants
  usere={Database of Chinese Encyclopeadias}, % this is "en" title variant in jurism
  url = {http://server.wenzibase.com},
  shorttitle = {leishuku},
  timestamp = {2016-04-21T13:06:09Z},
  langid = {pinyin}, %babel for "zh"
  type = {Full-text databse},
  author = {{愛如生}},
  urldate = {2016-04-20},
  year = {n.d.},
  file = {Google-Ergebnis für http\://crossasia.org/uploads/tx_sbbtyponewsletter/rte/RTEmagicC_erudition.png.png:/Users/HALmob/Library/Application Support/Firefox/Profiles/6wgnt11i.default/zotero/storage/ZF99V7IR/imgres.html:}
}

this would at least safe users from manually copying the title information, and play nice with current releases of biblatex and biber.

retorquere commented 8 years ago

Holy poo, that is a royal mess.

The author of Juris-M suggested a while ago everyone should just give up on Bib(La)TeX and wrap citeproc instead -- I'm beginning to believe that's actually the right approach. Fortunately, the aux/bbl/bcf process/format is exceedingly well documented (ahem) so that should happen any day now (right).

duncdrum commented 8 years ago

Yes and after looking into this again a few years after my last foray into bibtex I agree with Frank. What do you make of my suggestion about titles? Any idea why BBT currently repeats the title and puts it into titleaddon on export?

retorquere commented 8 years ago

Oh yeah I have an idea why -- I didn't know what I was doing when I implemented that. The titles sounds sensible, and fits easily in the current implementation. Certainly a hell of a lot easier than biblatexml.

retorquere commented 8 years ago

So for the comment I've marked leishuku:1:

  1. How would I decide what goes into titleaddon, and what goes into usere? The reference doesn't provide a preference
  2. Why usere instead of usera?
  3. The data for userd isn't in the reference
duncdrum commented 8 years ago

Based on swarm intelligence usere seems to be most common, no clue why. It is the last mentioned of the pack in the biblatex documentation, but no clue if thats part of the reason.

Yes, I just put the German into the commented section for demonstration purposes.

Since neither titleaddon nor user[-z] have defined uses I m trying to be consistent with example cases I found in the wild.

User[a-z] is a last resort thing. So if there are only two titlefields use title and titleaddon If there are three or more use titleaddon for lang variants of the main-title's language (= primary language, langid) based on their iso lang tags; e.g. the title is "zh" titleaddon is "zh-alalc97". " and user[a-z] for the rest.

retorquere commented 8 years ago

Can you give https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.50-circle-2369.xpi a try?

duncdrum commented 8 years ago

I did but with leishuku it gave an an error. ID VHCCSTTK I've tried the reference from the citekey example and that worked fine for titleaddon:


@collection{__2000-4,
  location = {{北京}},
  edition = {Revised Edition},
  title = {春秋左傳注},
  isbn = {7-101-00262-5},
  volumes = {4},
  timestamp = {2016-05-05T10:12:00Z},
  langid = {pinyin},
  titleaddon = {Chunqiu Zuozhuan Zhu},
  publisher = {{中华书局}},
  editor = {{楊伯峻}},
  date = {2000},
  keywords = {Chun qiu,Confucius,Zuoqiu; Ming,Zuo zhuan,左丘明,左傳,春秋},
  file = {Yang Bojun 楊伯峻 - 1981 - Chunqiu Zuozhuan Zhu 春秋左傳注.pdf:/Users/HALmob/Library/Application Support/Firefox/Profiles/6wgnt11i.default/zotero/storage/JMGHFDD2/Yang Bojun 楊伯峻 - 1981 - Chunqiu Zuozhuan Zhu 春秋左傳注.pdf:application/pdf},
  origdate = {1981}
}

Titleaddon also works in this example JGV925XZ with no transcription just Chinese and english.

@thesis{__2003-5,
  title = {明清時期出版與文化─以「才子佳人」小說為中心},
  url = {http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dnclcdr&s=id=%22091NCNU0493006%22.&searchmode=basic},
  pagetotal = {214},
  timestamp = {2016-05-06T19:43:11Z},
  titleaddon = {Publishing and Culture in Ming-Qing period : The Scholar-Beauty Novels as an Example},
  institution = {{國立暨南國際大學}},
  type = {Ph.{{D}}. {{Dissertation}}},
  author = {{顏采容}},
  date = {2003},
  file = {Yan Cairong 顏采容 Ming-Qing Publishing Culture 明清時期出版與文化─以「才子佳人」小說為中心 (200X).pdf:/Users/HALmob/Library/Application Support/Firefox/Profiles/6wgnt11i.default/zotero/storage/K9JJ2XHZ/Yan Cairong 顏采容 Ming-Qing Publishing Culture 明清時期出版與文化─以「才子佳人」小說為中心 (200X).pdf:application/pdf}
}

It only seems to struggle with more items which have both transcription and translation, in addition to the main title.

retorquere commented 8 years ago

The reference in VHCCSTTK doesn't have any multi fields as I hadn't merged #483 yet. Could you submit again with https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.50-circle-2373.xpi ?

retorquere commented 8 years ago

New version at https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.51-circle-2379.xpi -- .51 is out, and it would update over 2373 had you already installed it.

duncdrum commented 8 years ago

New 2379 error id for leishuku is DHXCG57F still an empty export. I also noticed that there are . in the auto-generated citekey with this version which were absent before. 38EJAWMJ on the other hand works fine:


@thesis{yan_cairong__2003,
  title = {明清時期出版與文化─以「才子佳人」小說為中心},
  url = {http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dnclcdr&s=id=%22091NCNU0493006%22.&searchmode=basic},
  pagetotal = {214},
  timestamp = {2016-05-06T19:43:11Z},
  titleaddon = {Publishing and Culture in Ming-Qing period : The Scholar-Beauty Novels as an Example},
  institution = {{國立暨南國際大學}},
  type = {Ph.{{D}}. {{Dissertation}}},
  author = {{顏采容}},
  date = {2003},
  file = {Yan Cairong 顏采容 Ming-Qing Publishing Culture 明清時期出版與文化─以「才子佳人」小說為中心 (200X).pdf:/Users/halalpha/Library/Application Support/Firefox/Profiles/sklfgs3h.default/zotero/storage/K9JJ2XHZ/Yan Cairong 顏采容 Ming-Qing Publishing Culture 明清時期出版與文化─以「才子佳人」小說為中心 (200X).pdf:application/pdf}
}
retorquere commented 8 years ago

I don't see a . in the citekey? But yeah the [zotero] pattern isn't really great. The only reason it's the default is to help people over.

I've found the problem triggered by the leishuku example, try the updated https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.51-circle-2380.xpi

There's a separate problem that makes my tests fail as multi-lingual references cause Juris-M to error out on the import, which makes tests currently impossible. I could work around it, but I don't know what the side effects of that would be, I've lodged a new issue at Juris-M.

duncdrum commented 8 years ago

very nice 2380 no more error and the output is as expected. Also the . seems to have been a 2379 artefact, from year = {n.d.}


@online{leishuku,
  title = {中國類書庫},
  url = {http://server.wenzibase.com},
  shorttitle = {leishuku},
  timestamp = {2016-04-21T13:06:09Z},
  langid = {pinyin},
  titleaddon = {Zhōngguó lèishū kù},
  usere = {Database of Chinese Encyclopeadias},
  type = {Full-text databse},
  author = {{愛如生}},
  urldate = {2016-04-20},
  year = {n.d.},
  file = {Google-Ergebnis für http\://crossasia.org/uploads/tx_sbbtyponewsletter/rte/RTEmagicC_erudition.png.png:/Users/halalpha/Library/Application Support/Firefox/Profiles/sklfgs3h.default/zotero/storage/ZF99V7IR/imgres.html:}
}
retorquere commented 8 years ago

I can't explain right now how 2379 would be different from 2380 when it comes to generating the citekey when the date is n.d., but if you're happy with the results of 2380, that's good enough for me.

I'm waiting for feedback on https://github.com/Juris-M/zotero/issues/20 before I merge this into master, as I want to have tests in place, and I can't until that issue is either fixed in Juris-M, or I get feedback that my proposed workaround is safe to use.

retorquere commented 8 years ago

For confirmation, the latest build passes all tests, including the newly added tests for this issue; the biblatex they export to can be found here. If you could confirm that biblatex looks good, I can merge and release, unless you have more test cases you want me to tackle.

duncdrum commented 8 years ago

I ve checked with about 10 different items with different types and multi-lingual fields, all worked well. citekeys are solid, and titleaddon and usere show probably more consistence then in biblatex itself. All good from my end, thanks again for the efforts.