retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.2k stars 284 forks source link

Juris-M Missing Fields #1273

Open yilu1015 opened 5 years ago

yilu1015 commented 5 years ago

With apologies for reviving a similar thread (#482), I ask if @retorquere is able to assist again with multi-lingual fields export using BBT (v5.1.139) in Juris-M (5.0.71m4).

I have uploaded this entry as a debugging report MDYA7RAM-euc. Here's an example of a citation in question,

Chen Zhongyuan 陈重远, Liulichang shihua 琉璃厂史话 [History of Liulichang] (Beijing chubanshe, 2015).

The output in my bib file reads:

@book{chen2015e, langid = {pinyin}, title = {琉璃厂史话}, isbn = {978-7-200-11112-5}, abstract = {}, pagetotal = {198}, titleaddon = {History of Liulichang}, usere = {Liulichang shihua}, publisher = {{北京出版社}}, date = {2015}, keywords = {}, author = {陈, 重远} }

So the transliterations of some fields (author and publisher) have disappeared, but the transliteration of title is successfully exported.

Also, I do not know if I should take up this issue with Pandoc, but when converting the citation key using my bib file and the Chicago (full note) CSL, the output became 陈重远, 琉璃厂史话. History of Liulichang (北京出版社, 2015); the transliterations are all gone! Using the same CSL, Juris-M is able to render the correct citation. Is it an issue with my CSL or Pandoc?

Thank you in advance for your help.

duncdrum commented 5 years ago

Hey Yilu, unless there have been some changes to biblaex since the thread you mentioned https://github.com/retorquere/zotero-better-bibtex/issues/482#issuecomment-216853371

Names, Publishers, Places etc. are beyond reach.

yilu1015 commented 5 years ago

Hey Yilu, unless there have been some changes to biblaex since the thread you mentioned #482 (comment)

Names, Publishers, Places etc. are beyond reach.

Thanks for the reply! Is there still any workaround, or will I have to add variants for these fields manually? Since transliteration of Chinese can be done automatically using certain Python packages, I am thinking about reformatting the footnotes after Pandoc conversion, but it does not affect BBT output and seems to me an overkill to me?

retorquere commented 5 years ago

For starters: Juris-M can do a lot more things with translated fields because it can access all the configuration around it. BBT translators cannot access these configurations, so not everything that Juris-M can do can be done in BBT, and there's the issue of what makes sense to export to bib(la)tex.

BBT can in principle transliterate (it includes transliteration and an optionally-activated kuromoji), but it's probably not needed, as the objects Juris-M offers to BBT have the alternate fields (which I'm going to guess are the transliterated versions, but I don't read Chinese, see below). But I'm not sure where I should store the transliterated author/publisher names for bib(la)tex or CSL.

AFAIK, CSL does not support multiple language versions. You can ask the pandoc people if they have a special case for things like this (and what they expect the CSL to look like), and I'll be happy to support it. Same goes for bib(la)tex. If you can give me a desired bib(la)tex output for MDYA7RAM-euc, I can see what I can do.

"creators": [
        {
          "firstName": "重远",
          "lastName": "陈",
          "creatorType": "author",
          "multi": {
            "_key": {
              "pny": {
                "firstName": "Zhongyuan",
                "lastName": "Chen"
              }
            },
            "main": "zh"
          }
        }
      ],
...
"publisher": "北京出版社",
"title": "琉璃厂史话",
...
"publisher": {
            "pny": "Beijing chubanshe"
          },
"title": {
            "en": "History of Liulichang",
            "pny": "Liulichang shihua"
          }
yilu1015 commented 5 years ago

Many thanks for your kind replies. I am not from a tech background and do not have concrete ideas about structuring output, but I get the gist here: biblatex cannot hold all the information from Juris-M. Likewise, there are CSL data fields without equivalents in biblatex. It is quite funny that one author even suggested there might be even ideological difference between the two, but as an end-user, I only hope for a simple workaround.

I am pleased to know that BBT can participate in transliteration; after all, with the exception of titles, which require manual translation, most of the multilingual fields in Juris-M are simple transliterations and can technically be automated. However, in the case of Chinese (or other languages written without spaces), there is the issue of tokenization: some pre-processing is necessary to split words. In my example, the transliterated title should be rendered as "liulichang shihua", rather than the unpunctuated "liulichangshihua".

There are Python/Javascript packages (such as Hanlp) that will tokenize Chinese and output transliteration; this is probably the best solution I have. BBT is probably not the best tool to handle this. Nevertheless, I wonder if there is something that could be done with the existing data. For example, I see the transliteration of book title is stored in usere, while the English translation takes up titleaddon. As long as they are captured in the bib file, I presume there's a way to instruct CSL to render them in accordance with citation styles as usere title [titleaddon]. (The correct output, as done by Juris-M, is Chen Zhongyuan 陈重远, Liulichang shihua 琉璃厂史话 [History of Liulichang] (Beijing chubanshe, 2015) -- in other words, pny-lastName pny-firstName author, usere title [titleaddon], publisher, date.

Being not familiar with biblatex and csl, I found this tool nevertheless: biblatex-csl-converter. Would it be of any help to you? If @duncdrum has alternative solutions, I shall be grateful to hear them.

As for CSL, I think the major issue is that the bib file did not capture all the information; after all, Juris-M's translator, even using the default CSL, seems to be able to render the correct citations. But there's much room for improvement: currently, I can only manually editing individual bibliographical entries, rather than applying the same variant -- say an author's name -- across the entire database. Ideally, I hope Juris-M will support bulk changes to transliterations, but I know this is not the right place to ask, and that I'm digressing.

With apologies for this long-winded reply, I thank you for your help.

retorquere commented 5 years ago

I am not from a tech background and do not have concrete ideas about structuring output,

But if you don't have concrete ideas about the output, who does? Without concrete ideas on what you want the output to look like, I'm not sure what you want done.

but I get the gist here: biblatex cannot hold all the information from Juris-M.

I mean technically, biblatex can hold anything you want. Bib files are pretty flexible that way. Whether it will do something useful depends on the style in use. But if you're going outside the documented biblatex fields, you need to have concrete ideas on what you want to go where.

Nevertheless, I wonder if there is something that could be done with the existing data. For example, I see the transliteration of book title is stored in usere , while the English translation takes up titleaddon . As long as they are captured in the bib file, I presume there's a way to instruct CSL to render them in accordance with citation styles as usere title [titleaddon] .

There are two concepts running together here. Anything which uses a bib file doesn't use CSL, and vice versa. I suppose one could say that pandoc does, but that's not actually accurate - pandoc internally translates bibtex to csl, and then uses csl styles for rendering. You're better off just having BBT export CSL. I don't know exactly how pandoc translates bibtex to CSL, but any layer of translation incurs loss, so instead of zotero-bbt-pandoc-csl you're much better off with zotero-bbt-csl.

(The correct output, as done by Juris-M, is Chen Zhongyuan 陈重远, Liulichang shihua 琉璃厂史话 [History of Liulichang] (Beijing chubanshe, 2015) -- in other words, pny-lastName pny-firstName author, usere title [titleaddon], publisher, date .

But the thing is, BBT is not in the business of generating output like this. BBT generates half-products (biblatex and csl-json files) which are consumed by citation processors which generate the actual output. So in order to get a bibliography as you want it, you'll need to be able to tell what you want to feed into those citation processors to make that happen.

Being not familiar with biblatex and csl, I found this tool nevertheless: biblatex-csl-converter . Would it be of any help to you?

I'm co-author on that package, but no, it wouldn't help. That package consumes bibtex and converts it (as far as is possible) to CSL-JSON, where the question at hand is how to convert zotero items into either biblatex or csl-json.

As for CSL, I think the major issue is that the bib file did not capture all the information

Again, two different things. Bib files and csl have no touching point other than that the formats can be converted between each other, but almost always with loss of information.

after all, Juris-M's translator, even using the default CSL, seems to be able to render the correct citations.

That's the citation processor doing that, not the translator, and Juris-M has an augmented csl processor that knows how to deal with its own multi language items. If you can find a bibtex processor that can do something similar, and you know what input it expects, I can probably generate that, but I need to know what it expects.

But there's much room for improvement: currently, I can only manually editing individual bibliographical entries, rather than applying the same variant -- say an author's name -- across the entire database.

But edit how? If you can show me what you expect the bibtex to look like, I can have a look, but it's too vague for me what you want right now.

retorquere commented 5 years ago

I have to stress here that I know very little about csl or bib(la)tex, and even less about tools that process them. I work mostly from samples, and then consult experts who do know these things whether I'm interpreting them right. But without a sample of what you want to see exported (either as csl or bib(la)tex) I have no purchase on the problem and there's nothing I can do.

duncdrum commented 5 years ago

The reason why BBT outputs english as titleaddon and and pinyin as usere in your example, is the way that you use lanuage tags in juris-m. Titleaddon will take transliteration if one of the multi-lingual fields is a script variant of the main language field. pny is not a language tag, but a script tag, the correct tag would be zh-pny or zh-Latn-pny. So since pny is not a lang tag (as pinyin is not a language) BBL has to guess what to do with it, and since e comes before p my guess is if all else fails it probably sorts alphabetically.

This works best in my experience with citiation styles since titleaddon by default appears before user[a-z], but there is nothing defined in biblatex that says anything about the difference between transliteration and translation. -addon fields by default appear right after their parentitem.

I m not sure i understand your comment about he missing whitespace. I would always fill in the transliteration info in juris-m by hand. The fact that BBl produces the right kind of data here is pretty amazing, since it has to work with more languages than Chinese. Back when looking for solutions, a universal transliteration library with high accuracy and sufficient speed wasn't around, if there is a better one around now, please let us know. If you have a whitespace in juris-m and bbl strips that it would be a bug.

Looking through the latest official docs for biblatex, i see some movement. @retorquere I wonder if we could give nameaddon another try, i don't recall how it used to generate broken bib files, but if that bug is no more, then doing to all the -addon fields what BBL already does for titleaddon should be straightforward.

@yilu1015 the examples for how to get biber to format the right kind of citations here are pretty much still on point. I keep using them.

retorquere commented 5 years ago

I'm perfectly fine with trying -- I really need 3 pieces of information to make any headway:

  1. A sample item -- MDYA7RAM-euc suffices for me
  2. A sample of what is currently exported from that sample
  3. A sample of what should be exported from that sample

for 2. and 3. I need actual, full bib(la)tex text, not descriptions of what they should look like.

The fact that BBl produces the right kind of data here is pretty amazing, since it has to work with more languages than Chinese.

BTW I don't actually transliterate for fields. The transliterated parts are just handed to me by Juris-M, I don't know whether those are user-entered or Juris-M does anything smart. The only place I currently actively transliterate is in key generation.

duncdrum commented 5 years ago

So before I embark on the bi-annual round of mutli-lingual biblatex bonanza, i want to read up and experiment with the recent changes to biblatex and biber. If it looks like something doable from BBL i ll start drafting some samples. There are three main areas of concern:

Of course none of this will go anywhere, if the bug where valid bibfiles cannot be parsed by biber is still around.

retorquere commented 5 years ago

I'm all for this. BBT generates output for actual use, not as an archival format, so whatever I produce should in some way be usable by a bibliography tool chain.

retorquere commented 5 years ago

btw if you want to know what BBT gets, you can export in the BetterBibTeX JSON fofrmat.

retorquere commented 5 years ago

Also, I do not know if I should take up this issue with Pandoc, but when converting the citation key using my bib file and the Chicago (full note) CSL, the output became 陈重远, 琉璃厂史话. History of Liulichang (北京出版社, 2015) ; the transliterations are all gone! Using the same CSL, Juris-M is able to render the correct citation. Is it an issue with my CSL or Pandoc?

If the transliterations are in the bib file, then yes, you need to take it up with pandoc. If they are not, you'll need to ask pandoc where it wants them.

So you'll need to take it up with pandoc in any case.

Also, if you're doing anything with pandoc other than tex -> pdf, you want to feed pandoc csl, not bib, as mentioned above. Since you mention using a CSL style, you certainly do not want to be using a bib file.

yilu1015 commented 5 years ago

Thanks for your suggestions. I will defer to @duncdrum for more sophisticated testing, but having looked at the output of BetterBibTeX JSON, I hope there will be a way to convert transliterations of names and publishers to add-on fields, such as from

"creators": [
        {
          "firstName": "重远",
          "lastName": "陈",
          "creatorType": "author",
          "multi": {
            "_key": {
              "pny": {
                "firstName": "Zhongyuan",
                "lastName": "Chen"

to something like this:

"creators": [
        {
          "firstName": "重远",
         "firstName-addon": "Zhongyuan",
          "lastName": "陈",
         "lastName-addon": "Chen",
          "creatorType": "author"}

In the meantime, I have explored the option of feeding Pandoc CSL, but the output of Better CLS JSON/YAML is even more limited: no transliteration or translation is preserved.

{"id":"chen2015e","type":"book","title":"琉璃厂史话","publisher":"北京出版社","number-of-pages":"198","source":"Google Books","abstract":"","ISBN":"978-7-200-11112-5","language":"zh","author":[{"family":"陈","given":"重远"}],"issued":{"date-parts":[[2015]]}}, When I export the file as CSL JSON, all variant fields preserved.

[{
  "id": "chen2015e",
  "type": "book",
  "multi": {
    "main": {
      "title": "zh",
      "publisher": "zh"
    },
    "_keys": {
      "title": {
        "en": "History of Liulichang",
        "pny": "Liulichang shihua"
      },
      "publisher": {
        "pny": "Beijing chubanshe"
      }
    }
  },
  "title": "琉璃厂史话",
  "publisher": "北京出版社",
  "number-of-pages": "198",
  "source": "Google Books",
  "abstract": "",
  "ISBN": "978-7-200-11112-5",
  "language": "zh",
  "author": [
    {
      "family": "陈",
      "given": "重远",
      "multi": {
        "_key": {
          "pny": {
            "family": "Chen",
            "given": "Zhongyuan"
          }
        },
        "main": "zh"
      }
    }
  ],
  "issued": {
    "date-parts": [
      [
        "2015"
      ]
    ]
  }
}]

However, the resulting citation omits translations and transliterations. I will take up the issue with pandoc-citeproc.

Note: Using CSL JSON seems to be less error-tolerant? When I ran the command line for the first time, I received an error. I guess I must've mis-entered a date somewhere. In the end, I tested with only one entry -- "Liulichang shihua", our current test case -- but unfortunately did not obtain the correct citation.

Error reading bibliography /Users/.../My Library.json Error in $[84].issued: Could not parse RefDate
Error running filter pandoc-citeproc:
Filter returned error status 1

Thanks to both of you for your generous help.

retorquere commented 5 years ago

CSL is indeed more restrictive; the CSL spec doesn't support language variants, so no processor will support it, whether the variants are present or not, with the exception of Juris-M itself. And unless you're using pandoc to do tex -> pdf, it is translating bib to csl internally before doing anything else, so if there's language variants in the bib file, they won't show up in the bibliography produced by pandoc.

I'm all for improving the BBT output, but doing so should serve a purpose. Creating multi-language csl or biblatex which can be used by no existing tool wouldn't serve a purpose.

If pandoc chokes on the CSL date you posted above, that's an error in pandoc, afaict that date is valid csl.

retorquere commented 5 years ago

BTW the same goes for CSL as for bib: to make headway I'd need

  1. A sample item -- MDYA7RAM-euc suffices for me
  2. A sample of what is currently exported from that sample
  3. A sample of what should be exported from that sample

but also, for both cases, some kind of supporting information that shows that the new information I put in the CSL/bib file is put to use -- whether by pandoc or anything else.

But I want to stress (and I'll try to stop after this) that there's only one case where the best output from pandoc comes from using bib files, and that's when it's driving LaTeX to produce the final output. In all other cases, using pandoc is guaranteed to get no better results, and usually worse, than using CSL directly. Pandoc will strip any extra information you have in the bib file that doesn't fit into CSL, as it must, because CSL doesn't support anything else.

yilu1015 commented 5 years ago

Thank you for your kind explanation. Like the user in the other post, I adopted my workflow after reading this tutorial about using pandoc to convert bib into citations, without realizing that pandoc-citeproc can handle csl json directly with less information loss. However, as I replied yesterday, I found that Better CSL JSON outputs even less information than Better Biblatex. For MDYA7RAM-euc it has only yielded:

{"id":"chen2015e","type":"book","title":"琉璃厂史话","publisher":"北京出版社","number-of-pages":"198","source":"Google Books","abstract":"","ISBN":"978-7-200-11112-5","language":"zh","author":[{"family":"陈","given":"重远"}],"issued":{"date-parts":[[2015]]}} CSL JSON outputs the most complete information; both translations and transliterations are stored. However, parsing it through pandoc-citeproc still does not produce the correct result -- likely an issue with how pandoc-citeproc recognizes certain fields such as

"multi": {
        "_key": {
          "pny": {
            "family": "Chen",
            "given": "Zhongyuan"
          }

And this, I presume, will have to be addressed by Pandoc? Thanks again for your helpful reminder.

retorquere commented 5 years ago

However, as I replied yesterday, I found that Better CSL JSON outputs even less information than Better Biblatex.

And as I have been trying to explain since, this is a distinction without a difference. Pandoc will not use the information that is in the biblatex which is not in the CSL. There's no point in outputting information that's not going to be used. The list of fields in the CSL spec is here -- if a field is not on that list, pandoc is pretty much guaranteed to not even know it's present at all. If there is something I output in the biblatex format that doesn't show up in the CSL but has an appropriate field in that list, I'd be happy to add it. But this would really surprise me TBH. I don't actually do much to generate the CSL-JSON -- Zotero generates it for me, and I add a few things here and there. If there was information in Zotero items that a CSL processor could use, Zotero would put it there.

Juris-M is, as mentioned, as special case as it has a modified CSL processor that supports multi-lang data, but it is, as far as I know, the only existing CSL processor that supports this, so the extra information just won't be used anywhere if it were to be exported.

{"id":"chen2015e","type":"book","title":"琉璃厂史话","publisher":"北京出版社","number-of-pages":"198","source":"Google Books","abstract":"","ISBN":"978-7-200-11112-5","language":"zh","author":[{"family":"陈","given":"重远"}],"issued":{"date-parts":[[2015]]}}

CSL JSON outputs the most complete information; both translations and transliterations are stored.

This is simply not true. I think you're thinking of the BetterBibTeX JSON format, which is not CSL JSON but just a sanitized dump of the internal representation of the Juris-M item format -- of course that has more info, but there's no program in existence that can read it except my test suite.

However, parsing it through pandoc-citeproc still does not produce the correct result -- likely an issue with how pandoc-citeproc recognizes certain fields such as

"multi": {
        "_key": {
          "pny": {
            "family": "Chen",
            "given": "Zhongyuan"
          }

And this, I presume, will have to be addressed by Pandoc? Thanks again for your helpful reminder.

This is because it's not CSL-JSON. It's my own debug format, which is a sanitized version of the internal Juris-M item format. Pandoc won't understand it, nor will any other citation processor. It is not even close to valid CSL-JSON.

retorquere commented 5 years ago

Like the user in the other post, I adopted my workflow after reading this tutorial about using pandoc to convert bib into citations, without realizing that pandoc-citeproc can handle csl json directly with less information loss.

That tutorial has since been superseded and recommends CSL-JSON now.

yilu1015 commented 5 years ago

Thank you for your patient explanation. Again, I apologize if I come across uninformed; I am not too tech savvy, and adopted this workflow with the sole purpose of saving time and energy with formatting citations. What a rabbit hole I got myself into!

Regarding the export output, let me simply submit the following for the record, using my Juris-M. Based on your feedback, I will explore ways of using pandoc to correctly filter CSL JSON. In the meantime, I thank you and @duncdrum for updating fields in BBL and CSL and will defer to your judgments.

Better CSL JSON {"id":"chen2015e","type":"book","title":"琉璃厂史话","publisher":"北京出版社","number-of-pages":"198","ISBN":"978-7-200-11112-5","language":"zh","author":[{"family":"陈","given":"重远"}],"issued":{"date-parts":[[2015]]}}

CSL JSON

[{
        "id": "chen2015e",
        "type": "book",
        "multi": {
            "main": {
                "title": "zh",
                "publisher": "zh"
            },
            "_keys": {
                "title": {
                    "en": "History of Liulichang",
                    "pny": "Liulichang shihua"
                },
                "publisher": {
                    "pny": "Beijing chubanshe"
                }
            }
        },
        "title": "琉璃厂史话",
        "publisher": "北京出版社",
        "number-of-pages": "198",
        "ISBN": "978-7-200-11112-5",
        "language": "zh",
        "author": [
            {
                "family": "陈",
                "given": "重远",
                "multi": {
                    "_key": {
                        "pny": {
                            "family": "Chen",
                            "given": "Zhongyuan"
                        }
                    },
                    "main": "zh"
                }
            }
        ],
        "issued": {
            "date-parts": [
                [
                    "2015"
                ]
            ]
        }
    }]

BetterBibTex JSON

  [{
      "version": 23785,
      "itemType": "book",
      "multi": {
        "main": {
          "publisher": "zh",
          "title": "zh"
        },
        "_keys": {
          "publisher": {
            "pny": "Beijing chubanshe"
          },
          "title": {
            "en": "History of Liulichang",
            "pny": "Liulichang shihua"
          }
        }
      },
      "publisher": "北京出版社",
      "ISBN": "978-7-200-11112-5",
      "date": "2015",
      "language": "zh",
      "title": "琉璃厂史话",
      "numPages": "198",
      "creators": [
        {
          "firstName": "重远",
          "lastName": "陈",
          "creatorType": "author",
          "multi": {
            "_key": {
              "pny": {
                "firstName": "Zhongyuan",
                "lastName": "Chen"
              }
            },
            "main": "zh"
          }
        }
      ],
      "tags": [
        {
          "tag": "@Process"
        },
        {
          "tag": "C1"
        }
      ],
      "collections": [
        "NMNLLP43",
        "FG6NLL3K"
      ],
      "relations": [],
      "dateAdded": "2019-07-29T00:47:51Z",
      "dateModified": "2019-09-04T13:22:06Z",
      "uri": "http://zotero.org/users/3637397/items/S6GZM5CD",
      "attachments": [],
      "notes": [],
      "seeAlso": [],
      "itemID": 5453,
      "key": "S6GZM5CD",
      "citekey": "chen2015e",
      "citationKey": "chen2015e",
      "libraryID": 1
    }]
retorquere commented 5 years ago

Thank you for your patient explanation. Again, I apologize if I come across uninformed; I am not too tech savvy, and adopted this workflow with the sole purpose of saving time and energy with formatting citations. What a rabbit hole I got myself into!

Ayup, citations are a major rabbit hole.

Regarding the export output, let me simply submit the following for the record, using my Juris-M. Based on your feedback, I will explore ways of using pandoc to correctly filter CSL JSON. In the meantime, I thank you and @duncdrum for updating fields in BBL and CSL and will defer to your judgments.

Better CSL JSON {"id":"chen2015e","type":"book","title":"琉璃厂史话","publisher":"北京出版社","number-of-pages":"198","ISBN":"978-7-200-11112-5","language":"zh","author":[{"family":"陈","given":"重远"}],"issued":{"date-parts":[[2015]]}}

This is in-spec CSL-JSON.

CSL JSON

[{
      "id": "chen2015e",
      "type": "book",
      "multi": {
          "main": {
              "title": "zh",
              "publisher": "zh"
          },
          "_keys": {
              "title": {
                  "en": "History of Liulichang",
                  "pny": "Liulichang shihua"
              },

....

Ah, OK, my mistake. Yes, this is sort-of-CSL with transliterated fields. But this is Juris-M-specific CSL-JSON. It is (purposefully) out of spec. No CSL processor except Juris-M will understand it. Outputting this with BBT serves no purpose as Juris-M doesn't need it (it uses it as it generates it itself) and other tools cannot use it. There's literally no point to outputting this and I'm betting that Juris-M outputs it by accident -- it inherits Zotero's code, Zotero doesn't know about these so doesn't clean them out. I could be wrong, but I do not know any tool that could use it. If anyone knows, it would be @fbennett.

BetterBibTex JSON

Doesn't matter. Debug format. Only usable for my tests.

retorquere commented 5 years ago

If I take the biblatex above:

@book{chen2015e, langid = {pinyin}, title = {琉璃厂史话}, isbn = {978-7-200-11112-5}, abstract = {}, pagetotal = {198}, titleaddon = {History of Liulichang}, usere = {Liulichang shihua}, publisher = {{北京出版社}}, date = {2015}, keywords = {}, author = {陈, 重远} }

and run that through pandoc-citeproc --bib2json, I get

[
  {
    "ISBN": "978-7-200-11112-5",
    "author": [
      {
        "family": "陈",
        "given": "重远"
      }
    ],
    "id": "chen2015e",
    "issued": {
      "date-parts": [
        [
          2015
        ]
      ]
    },
    "language": "pinyin",
    "number-of-pages": "198",
    "publisher": "北京出版社",
    "title": "琉璃厂史话. History of Liulichang",
    "title-short": "琉璃厂史话",
    "type": "book"
  }
]

so pandoc does understand titleaddon of sorts, but just adds it to the title field. I suppose I could do the same, but then I'd definately want to hear from @fbennett.

fbennett commented 5 years ago

I do not know any tool that could use it. If anyone knows, it would be @fbennett.

As far as I know, Jurism is the only tool that reads the extended metadata. multi keys can appear on names, and at the top level (with slightly different semantics in the two locations).

fbennett commented 5 years ago

so pandoc does understand titleaddon of sorts, but just adds it to the title field. I suppose I could do the same, but then I'd definately want to hear from @fbennett.

The titleaddon in the bibtex is just a translation of the Chinese title. Not sure if users would find that useful, maybe the OP could say.

retorquere commented 5 years ago

When I see https://tex.stackexchange.com/questions/301566/subtitle-or-titleaddon-which-to-choose#302126, adding it to the title doesn't seem prudent - not everyone may want the translated version just appended. But I don't see anything in the CSL var list which would be a candidate for translated titles or creators. Even having it in the biblatex titleaddon seems a little iffy in retrospect.

duncdrum commented 5 years ago

The reasoning behind the current workings from #482 is that for all fields with an addon variant the fixed sequence for output is, eg.: title titleaddon usera. Depending on Citation Style users have to configure if they want transliteration and/or translation and their sequence, in the bibliography. So the idea was that titleaddon takes the variant of the title according to lang tag, while user takes translation. In the OP this isn't the case as I already mentioned because of pny . This consistency however is on BBL, there is nothing in biblatex that defines the preferred contents of titleaddon or a data model that defines the relation between title and titleaddon

duncdrum commented 5 years ago

@retorquere to expand the fact that titleaddon is not displayed like subtitle is actually crucial for multi-lingual references using CJK characters, as italic CJK titles are a) a big no no, and b) hard to get rid of in latex without the use of titleaddon. I wouldn't say its iffy at all.

From a user perspective if i know I want my title field to hold all the data nothing prevents me from putting it all in via vanilla zotero, e.g.:

title = {琉璃厂史话, Liulichang shihua, History of Liulichang}

Thanks to @fbennett we have dedicated fields in juris-m this is much easier to work with via latex preamble. So undoing this separation on export seems pretty counterproductive to me.

retorquere commented 5 years ago

Ah I'd missed the italics stuff. Alright then.

All this sounds to me like it argues against adding the transliterated version to the title for CSL export. That leaves standing that transliterated titles can't be brought to pandoc. For the case under discussion, pandoc's interpretation of titleaddon is not what we're looking for (but I will concede that it means it's not always, just usually, better to opt for csl-json when working with pandoc)

retorquere commented 5 years ago

I'm not sure what the current status is. Anyone?

duncdrum commented 5 years ago

New activity in biblatex: https://github.com/plk/biblatex/issues/416#issuecomment-531138722

Will get to this next week

retorquere commented 5 years ago

Wow, those are some major changes. This is not my area of expertise, so I'm not going to meddle, but the multi-entry variants each having their own key puts me in a bit of a bind. I have no UI concept for the user to manage this. I also wonder why @BOOK{ms2, was chosen over something like @VARIANT{ms2, (or can a variant be of a different type?) to mark clearly that these are not to be cited directly, or even something like

@BOOK{ms1,
  LANGID = {greek},
  VARIANTTYPE = {original},
  TITLE = {Περίοδοι καὶ μαρτύριον τοῦ ἁγίου Βαρνάβα τοῦ ἀποστόλου},
  VARIANTS = {
    {
      LANGID = {greek},
      VARIANTTYPE = {transcribed},
      TITLE = {Periodoi kai martyrion tou agiou Barnaba tou apostolou},
    },

    { 
      LANGID = {greek},
      VARIANTTYPE = {transliteration},
      TITLE = {Periodoi kai martyrion tou agiou Barnaba tou apostolou},
    },

    {
      LANGID = {french},
      VARIANTTYPE = {translated},
      TITLE = {Voyages et martyres de Saint Barnabé, l'apotre},
    },

    {
      LANGID = {french},
      VARIANTTYPE = {normalised},
      TITLE = {Actes de Barnabé},
    },

    {
      LANGID = {latin},
      VARIANTTYPE = {normalised},
      TITLE = {Acta Barnabae},
    },
  },
}

but I know way too little of the intricacies of biber/biblatex/multilocale to make a sensible contribution I fear.

duncdrum commented 4 years ago

ok things are shaping up, but this could take a while. So let's keep this as WIP

vicleroy commented 2 years ago

Hi! Sorry to bring back this issue, but I'm having the same problem as OP. Reading this thread, I just saw that apparently version 4.0 of biblatex might solve the problem of unsupported names/places variations. Do you think there might be a chance that this allows for full export of Juris-M multilingual fields into biblatex?

retorquere commented 2 years ago

If someone can tell me what the export is supposed to look like, I can see what I can do.

vicleroy commented 2 years ago

Can't speak in the place of OP, but I'd say that with the example they gave, the idea would be to go from this current output:

@book{chen2015e, 
langid = {pinyin}, 
title = {琉璃厂史话}, 
titleaddon = {History of Liulichang}, 
usere = {Liulichang shihua}, 
author = {陈, 重远}, 
publisher = {{北京出版社}}, 
isbn = {978-7-200-11112-5}, 
abstract = {}, 
pagetotal = {198}, 
date = {2015}, 
keywords = {}, 
}

to that:

@book{chen2015e, 
langid = {pinyin}, 
title = {琉璃厂史话}, 
titleaddon = {History of Liulichang}, 
usere = {Liulichang shihua},
author = {陈, 重远}, 
authoraddon = {Chen, Zhongyuan},
publisher = {{北京出版社}}, 
publisheraddon = {Beijing chubanshe},
isbn = {978-7-200-11112-5}, 
abstract = {}, 
pagetotal = {198}, 
date = {2015}, 
keywords = {}, 
}
vicleroy commented 2 years ago

I thought it might be best to try and recreate this example of output that has been given on the biblatex multiscript support thread. I noted that in the request that led to this example, translation and transliteration weren't all included and were weirdly mixed up. Most publishers and universities would probably ask for a print reference that would include both, such as

Kōno Rokurō 河野六郎, Hidemasa Nagata 永田英正, and Hiroyuki Sasahara 笹原宏之 (2001). Kanji 漢字 [Japanese Ideograms]. In: Sekai moji jiten 世界文字辞典 [Encyclopedia of the World’s Scripts]. Ed. by Kōno Rokurō 河野六郎, Chino Eiichi 千野栄一, and Nishida Tatsuo 西田龍雄. Tokyo: Sanseidō 三省堂, pp. 256–281.

So to make it more accurate, I created a new entry in Juris-M with all this information, but with japanese original titles and names as "main" fields and transliterations + translations as "variant" fields:

image

Here is how this item appears in the BetterBibTex JSON export that I get:

{
      "attachments": [],
      "citationKey": "kono_2001",
      "citekey": "kono_2001",
      "creators": [
        {
          "creatorType": "author",
          "firstName": "六郎",
          "lastName": "河野",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Rokurō",
                "lastName": "Kōno"
              }
            },
            "main": "ja"
          }
        },
        {
          "creatorType": "author",
          "firstName": "英正",
          "lastName": "永田",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Hidemasa",
                "lastName": "Nagata"
              }
            },
            "main": "ja"
          }
        },
        {
          "creatorType": "author",
          "firstName": "宏之",
          "lastName": "笹原",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Hiroyuki",
                "lastName": "Sasahara"
              }
            },
            "main": "ja"
          }
        },
        {
          "creatorType": "editor",
          "firstName": "六郎",
          "lastName": "河野",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Rokurō",
                "lastName": "Kōno"
              }
            },
            "main": "ja"
          }
        },
        {
          "creatorType": "editor",
          "firstName": "栄一",
          "lastName": "千野",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Eiichi",
                "lastName": "Chino"
              }
            },
            "main": "ja"
          }
        },
        {
          "creatorType": "editor",
          "firstName": "瀧雄",
          "lastName": "西田",
          "multi": {
            "_key": {
              "ja-alalc97": {
                "firstName": "Tatsuo",
                "lastName": "Nishida"
              }
            },
            "main": "ja"
          }
        }
      ],
      "date": "2001",
      "dateAdded": "2021-09-28T06:57:38Z",
      "dateModified": "2021-09-28T07:27:37Z",
      "itemID": 386,
      "itemKey": "9ND2V7L8",
      "itemType": "bookSection",
      "key": "9ND2V7L8",
      "language": "ja",
      "libraryID": 1,
      "multi": {
        "_keys": {
          "bookTitle": {
            "en": "Encyclopedia of the World's Scripts",
            "ja-alalc97": "Sekai moji jiten"
          },
          "publisher": {
            "ja-alalc97": "Sanseidō"
          },
          "title": {
            "en": "Japanese Ideograms",
            "ja-alalc97": "Kanji"
          }
        },
        "main": {
          "bookTitle": "ja",
          "publisher": "ja",
          "title": "ja"
        }
      },
      "notes": [],
      "pages": "256--281",
      "place": "Tokyo",
      "publicationTitle": "世界文字辞典",
      "publisher": "三省堂",
      "relations": [],
      "select": "zotero://select/library/items/9ND2V7L8",
      "tags": [],
      "title": "漢字",
      "uri": "http://zotero.org/users/local/AnJLlW4M/items/9ND2V7L8",
      "version": 0
    }

Now if I try to export this as Better BibLaTex, the only output I get is

@incollection{kono_2001,
  title = {漢字},
  booktitle = {世界文字辞典},
  author = {河野, 六郎 and 永田, 英正 and 笹原, 宏之},
  editor = {河野, 六郎 and 千野, 栄一 and 西田, 瀧雄},
  date = {2001},
  pages = {256--281},
  publisher = {{三省堂}},
  location = {{Tokyo}},
  langid = {japanese},
  titleaddon = {Kanji},
  usere = {Japanese Ideograms},
}

As was the initial issue in this thread, transliteration and translation have been successfully included as titleaddon and usere respectively for the title (but NOT for the booktitle) and transliteration of names for the author, editor and publisher fields have been lost.

What we'd need is an output that looks like this:

@incollection{kono_2001,
  LANGID={japanese},
  AUTHOR                      = {河野六郎 and 永田英正 and 笹原宏之},
  AUTHOR_transliteration_ja-latn    = {Kōno, Rokurō and Nagata, Hidemasa and Sasahara, Hiroyuki},
  EDITOR                      = {河野六郎 and 千野栄一 and 西田龍雄},
  EDITOR_transliteration_ja-latn    = {Kōno, Rokurō and Chino, Eiichi and Nishida, Tatsuo},
  TITLE                       = {漢字},
  TITLE_transliteration_ja-latn     = {Kanji},
  TITLE_translation_en-us     = {Japanese Ideograms},
  BOOKTITLE                   = {世界文字辞典},
  BOOKTITLE_transliteration_ja-latn     = {Sekai moji jiten},
  BOOKTITLE_translation_en-us = {Encyclopedia of the World's Scripts},
  PUBLISHER                   = {三省堂},
  PUBLISHER_transliteration_ja-latn = {Sanseidō},
  ADDRESS                     = {Tokyo},
  PAGES                       = {256--281},
  DATE                        = {2001}
}

I'm new to the headache that multilingual / multiscript references management is and I'm not too tech-savvy so I can just hope this example helps. If I forgot anything please let me know.

vicleroy commented 1 year ago

Hi all. I was just wondering if you had made any progress regarding this issue or if it is too complex to be solved?

retorquere commented 1 year ago

No progress alas - Juris-M has fallen behind to the point that I couldn't maintain a version that was compatible with both zotero and Juris-M. The good news is that the Juris-M developer has picked up the work again to make it compatible with Zotero 7, so there's a decent chance this issue can be picked up when a Juris-M beta is released.

tom-newhall commented 9 months ago

Hi there,

Thanks for better bibtex. I'm also trying to do something similar, that is, creating multilingual citation entries in Juris-m that be interpreted by Pandoc's citeproc to produce accurate citations. I am struggling to figure out if it's even possible, even with a workaround.

Happy to provide more details if necessary. Thanks in advance.

retorquere commented 9 months ago

I don't know what the required bibtwx/csl would have to look like to make that work in pandoc, so that would need to be figured out, and then there's the problem that to my knowledge there isn't a Juris-M release based on the zotero 7 beta. It's not feasible to get BBT compatible with Juris-M in its current state.

tom-newhall commented 9 months ago

Hi, thank you so much for your response. After some work, I figured out a reasonable workaround for this, that I hope will be useful for others. It isn't a general solution, in that you would need to modify the CSL for whatever style you use, but perhaps this approach can be useful to others:

I wrote about this on the Zotero forum here: https://forums.zotero.org/discussion/110097/using-the-note-field-for-rendering-mutilingual-text-with-csl#latest

I also wrote a Gist that summarizes my solution: https://gist.github.com/tom-newhall/88557892c6646b8cfda9e8963c2b733d#hack-for-rendering-chinesejapanese-names-without-a-comma-between-the-family-and-given-name-using-standard-csl-and-not-juris-m

Hope that helps!